Running Highly Available Hazelcast Clusters

In the Hazelcast Platform Operator, you can use partition groups with High Availability Mode to add resilience to your clusters. Avoid data loss even when a Kubernetes node or whole availability zone is down, and all related Hazelcast members are terminated.

Partition groups let you choose where Hazelcast members store backups of a data partition. You can configure a partition group to backup data inside a member in a different availability zone (ZONE_AWARE) or on a different Kubernetes node (NODE_AWARE). See Partition Group Configuration for more details.

When using either type of partition grouping (ZONE_AWARE or NODE_AWARE) with a Hazelcast cluster that spans multiple availability zones and nodes, you must have an equal number of members in each zone or node. Otherwise, it results in the uneven distribution of partitions amongst the members.

The highAvailabilityMode parameter allows you to specify partition groups and to automatically distribute members across availability zones and nodes, using the Kubernetes policy topologySpreadConstraints.

Configuring High Availability Mode

Below are the configuration options for the High Availability Mode feature.

Field Description

highAvailabilityMode

Configuration for partition groups and Kubernetes scheduling policy
  • NODE: the partition-group is configured as NODE_AWARE and topologySpreadConstraints is added into the statefulset:

      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
          matchLabels:
            app.kubernetes.io/name: hazelcast
            app.kubernetes.io/instance: hazelcast
            app.kubernetes.io/managed-by: hazelcast-platform-operator
  • ZONE: the partition-group is configured as ZONE_AWARE and topologySpreadConstraints is added into the statefulset:

      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
          matchLabels:
            app.kubernetes.io/name: hazelcast
            app.kubernetes.io/instance: hazelcast
            app.kubernetes.io/managed-by: hazelcast-platform-operator

Example Configuration

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  highAvailabilityMode: ZONE

High Availability Mode and MULTI_MEMBER Routing

MULTI_MEMBER routing is a client feature that allows a client to connect to a subset of members that is determined by a grouping strategy. To benefit from MULTI_MEMBER routing, you must enable high availability mode, because this parameter is used to determine partition groups. Also, if the MULTI_MEMBER routing mode is used on the client-side, exposeExternally.type must be set to smart in the Hazelcast CR.

Example server and client configurations to use MULTI_MEMBER routing:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  highAvailabilityMode: ZONE
  exposeExternally:
    type: Smart
    discoveryServiceType: LoadBalancer
    memberAccess: LoadBalancer
  • Java

  • XML

  • YAML

ClientConfig clientConfig = new ClientConfig();
ClientNetworkConfig networkConfig = clientConfig.getNetworkConfig();
networkConfig.getClusterRoutingConfig().setRoutingMode(RoutingMode.MULTI_MEMBER);
// PARTITION_GROUPS is the default strategy, so it does not need to be explicitly defined
networkConfig.getClusterRoutingConfig().setRoutingStrategy(RoutingStrategy.PARTITION_GROUPS);
...
    <network>
        <cluster-routing mode="MULTI_MEMBER">
          <grouping-strategy>PARTITION_GROUPS</grouping-strategy>
        </cluster-routing>
    </network>
...
  network:
    cluster-routing:
      mode: MULTI_MEMBER
      grouping-strategy: PARTITION_GROUPS