Tuesday, October 26, 2010

Ehcache BigMemory: Simple High Availability, Even Simpler

My collegue Jason @ Terracotta did a nice post on using Ehcache with JRuby; which led to a discussion on a long list of features we implemented for Enterprise Ehcache (check out the discussion thread on Jason's Blog).

Adding to Jason's feature list, I would like to discuss HA (High Availability) in Ehcache and explain why our BigMemory product makes tuning HA even simpler. Lets review how we do HA in Enterprise Ehcache. In Enterprise Ehcache, clients going down is no big deal since the data is present on the servers as well. Our HA focuses on protecting our servers.  In Enterprise Ehcache you can define one or many server groups. Each group consists of a cluster of servers. The cluster has to decide which node is going to be the active server. This is decided by having an election where a node is selected to be the active server. The rest of the nodes in the cluster are waiting in passive standby ready to take over if the active node fails.

In order to actually detect when a failover needs to happen, we wrote a configurable HC (Health checker).  Our HC detects errors that won't show up as a normal network disconnect or failure, such as a network cable being pulled. Because Enterprise Ehcache is written in Java, we also had to deal with long GCs. So we designed our HC to detect long GCs as well.

Based on your use case, you may want to change the HC settings depending on what your tolerance is for network disruption and long GCs. Before you go about changing your settings, you might want to check out these files:

$TERRACOTTA_KIT/platform/config-samples/tc-config-healthchecker-aggressive.xml
$TERRACOTTA_KIT/platform/config-samples/tc-config-healthchecker-aggressive.xmltc-config-healthchecker-development.xml
$TERRACOTTA_KIT/platform/config-samples/tc-config-healthchecker-aggressive.xmltc-config-healthchecker-production.xml


Depending on what you're doing and what your requirements are, picking one of the settings above should be suffice.

Now let's discuss these properties:

l2.healthcheck.l2.ping.idletime=3000
l2.healthcheck.l2.ping.interval=1000
l2.healthcheck.l2.ping.probes=2
l2.healthcheck.l2.socketConnectTimeout=5
l2.healthcheck.l2.socketConnectCount=2


Above are the properties you have to work with for HC.  The HC starts off using the ping.idletime. This is the maximum amount of time that can elapse between the last time data was received from the corresponding node. In this case the idletime is 3000 milliseconds, after which the HC notes "Hey, didn't receive any data from the corresponding node, I should check on that node."

To check on the health of the node, it tries to ping the node in intervals, in which the interval length is defined by ping.interval. You can push this number down to get more granularity. If the corresponding node doesn't respond within the ping.interval then it either tries to probe again because the ping.probes countdown hasn't completed, or it checks socketConnectCount and see if its allowed to make any more socket connections. If not, it declares the corresponding node DEAD.

In the example above, since the socketConnectCount is set to 2, it will try to make another socket connection. If it cannot make the socket connection within (socketConnectTimeout * pingInterval) ms, then it will declare the node DEAD. In our example, the interval length 5000 ms. Once it established a connection it will repeat the ping probe cycle again.

The maximum time it will take the HC to detect a network disruption is ( ping.idletime + (ping.probes * ping.interval) + (socketConnectTimeout * ping.interval) ) ms. If the problem is longGC, then the connection will happen, but the pings won't receive a response. The maximum time HC takes to detect a long GC is ( socketConnectCount * ( ping.idletime + (ping.probes * ping.interval) + (socketConnectTimeout * ping.interval) ) ) ms.

If you have a short tolerance for network disruption, but your ok with having lengthy long GCs, then you can decrease the ping.idletime and increase the socketConnectCount; you tune based on your tolerances. Here's some detailed documentation on the HA settings.

With BigMemory in our server FORGET ALL THAT IS WRITTEN ABOVE.

Our HC has all these different properties because we had to be tolerant of Long GCs. When a node is in long GC, it will make a socket connection but not be able to complete the ping probe cycle. but now with BigMemory you probably don't ever need to change these settings, unless you have people tripping over your network cables.

Unlike Long GCs, network disruptions is something you probably know about and its easier to guess what that tolerance should be. Not having to tune for long GCs makes HC configuration simple. You only need to tune for YOUR own environment (i.e. crappy network, or clumsy workers) and not for something that is specific to Java (Long GCs).

Imagine what it can do for you. Check out our beta here.