Back Pressure
Hazelcast uses operations to make remote calls. For example, a map.get is an operation and
a map.put is one operation for the primary
and one operation for each of the backups; that is, map.put is executed for the primary and also for each backup.
In most cases, there is a natural balance between the number of threads performing operations
and the number of operations being executed. However, the following can disrupt this balance and operations
and eventually lead to OutofMemoryException (OOME):
- 
Asynchronous calls: With async calls, the system can be flooded with requests. 
- 
Asynchronous backups: The asynchronous backups can build up. 
To prevent the system from crashing, Hazelcast provides back pressure. Back pressure works by limiting the number of concurrent operation invocations and periodically making an async backup sync.
Member Side
Back pressure is disabled by default and you can enable it using the following system property:
hazelcast.backpressure.enabled
To control the number of concurrent invocations, you can configure the number of invocations allowed for each partition using the following system property:
hazelcast.backpressure.max.concurrent.invocations.per.partition
The default value of this system property is 100. Using the default configuration, a system can have (271 + 1) * 100 = 27200 concurrent invocations (271 partitions + 1 for generic operations).
Back pressure is only applied to normal operations. System operations, like heart beats and repartitioning operations,
are not influenced by back pressure. 27200 invocations might seem like a lot, but keep in mind that executing a task on IExecutor
or acquiring a lock also requires an operation.
If the maximum number of invocations has been reached, Hazelcast automatically applies an exponential backoff policy. This
gives the system some time to deal with the load.
Using the following system property, you can configure the maximum time to wait before a HazelcastOverloadException is thrown:
hazelcast.backpressure.backoff.timeout.millis
This system property’s default value is 60000 milliseconds.
The Health Monitor keeps an eye on the usage of the invocations. If it sees a member has consumed 70% or more of the invocations, it starts to log health messages.
Apart from controlling the number of invocations, you also need to control the number of pending async backups. This is done by periodically making these backups sync instead of async. This forces all pending backups to get drained. For this, Hazelcast tracks the number of asynchronous backups for each partition. At every Nth call, one synchronization is forced. This N is controlled through the following property:
hazelcast.backpressure.syncwindow
This system property’s default value is 100. It means, out of 100 asynchronous backups, Hazelcast makes one of them a synchronous one. A randomization is added, so the sync window with default configuration is between 75 and 125 invocations.
Client Side
To prevent the system on the client side from overloading, you can apply a constraint on the number of concurrent invocations. You can use the following system property on the client side for this purpose:
hazelcast.client.max.concurrent.invocations
This property defines the maximum allowed number of concurrent invocations.
When it is not explicitly set, it has the value Integer.MAX_VALUE by default, which means infinite.
When set and the maximum number of concurrent invocations exceeds this value,
Hazelcast throws HazelcastOverloadException when a new invocation comes in.
The back off timeout and controlling the number of pending async backups (sync window) is not supported on the client side.
| See the System Properties appendix to learn how to configure the system properties. |