I'm sure like many of you, I tend to browse DZone for interesting articles/blog posts. As a guy who works on Java server-side software, articles related Garbage Collection always catches my eye. Today while browsing on DZone I came across this article on Garbage Collection tuning.
Halfway through reading the article I realized the irony: I'm reading a pretty well written article on how to tune your Java Garbage Collector on the same day Terracotta announced GA for our BigMemory product. As I spelled out here and here BigMemory makes the need for tuning unnecessarily. And if your talking about Java Heaps large; GC Tuning? Forgetaboutit!
I know as a developer I would rather work on making my product better, faster and more concurrent then dread that GC gremlin showing up pausing my application and killing the buzz. If you do want to make YOUR application better and not let GC get in the way to that, check out the BigMemory product here.
Articles on Garbage Collection just got a whole lot less interesting.
Tuesday, November 9, 2010
Tuesday, October 26, 2010
Ehcache BigMemory: Simple High Availability, Even Simpler
My collegue Jason @ Terracotta did a nice post on using Ehcache with JRuby; which led to a discussion on a long list of features we implemented for Enterprise Ehcache (check out the discussion thread on Jason's Blog).
Adding to Jason's feature list, I would like to discuss HA (High Availability) in Ehcache and explain why our BigMemory product makes tuning HA even simpler. Lets review how we do HA in Enterprise Ehcache. In Enterprise Ehcache, clients going down is no big deal since the data is present on the servers as well. Our HA focuses on protecting our servers. In Enterprise Ehcache you can define one or many server groups. Each group consists of a cluster of servers. The cluster has to decide which node is going to be the active server. This is decided by having an election where a node is selected to be the active server. The rest of the nodes in the cluster are waiting in passive standby ready to take over if the active node fails.
In order to actually detect when a failover needs to happen, we wrote a configurable HC (Health checker). Our HC detects errors that won't show up as a normal network disconnect or failure, such as a network cable being pulled. Because Enterprise Ehcache is written in Java, we also had to deal with long GCs. So we designed our HC to detect long GCs as well.
Based on your use case, you may want to change the HC settings depending on what your tolerance is for network disruption and long GCs. Before you go about changing your settings, you might want to check out these files:
Depending on what you're doing and what your requirements are, picking one of the settings above should be suffice.
Now let's discuss these properties:
Above are the properties you have to work with for HC. The HC starts off using the ping.idletime. This is the maximum amount of time that can elapse between the last time data was received from the corresponding node. In this case the idletime is 3000 milliseconds, after which the HC notes "Hey, didn't receive any data from the corresponding node, I should check on that node."
To check on the health of the node, it tries to ping the node in intervals, in which the interval length is defined by ping.interval. You can push this number down to get more granularity. If the corresponding node doesn't respond within the ping.interval then it either tries to probe again because the ping.probes countdown hasn't completed, or it checks socketConnectCount and see if its allowed to make any more socket connections. If not, it declares the corresponding node DEAD.
In the example above, since the socketConnectCount is set to 2, it will try to make another socket connection. If it cannot make the socket connection within (socketConnectTimeout * pingInterval) ms, then it will declare the node DEAD. In our example, the interval length 5000 ms. Once it established a connection it will repeat the ping probe cycle again.
The maximum time it will take the HC to detect a network disruption is ( ping.idletime + (ping.probes * ping.interval) + (socketConnectTimeout * ping.interval) ) ms. If the problem is longGC, then the connection will happen, but the pings won't receive a response. The maximum time HC takes to detect a long GC is ( socketConnectCount * ( ping.idletime + (ping.probes * ping.interval) + (socketConnectTimeout * ping.interval) ) ) ms.
If you have a short tolerance for network disruption, but your ok with having lengthy long GCs, then you can decrease the ping.idletime and increase the socketConnectCount; you tune based on your tolerances. Here's some detailed documentation on the HA settings.
With BigMemory in our server FORGET ALL THAT IS WRITTEN ABOVE.
Our HC has all these different properties because we had to be tolerant of Long GCs. When a node is in long GC, it will make a socket connection but not be able to complete the ping probe cycle. but now with BigMemory you probably don't ever need to change these settings, unless you have people tripping over your network cables.
Unlike Long GCs, network disruptions is something you probably know about and its easier to guess what that tolerance should be. Not having to tune for long GCs makes HC configuration simple. You only need to tune for YOUR own environment (i.e. crappy network, or clumsy workers) and not for something that is specific to Java (Long GCs).
Imagine what it can do for you. Check out our beta here.
Adding to Jason's feature list, I would like to discuss HA (High Availability) in Ehcache and explain why our BigMemory product makes tuning HA even simpler. Lets review how we do HA in Enterprise Ehcache. In Enterprise Ehcache, clients going down is no big deal since the data is present on the servers as well. Our HA focuses on protecting our servers. In Enterprise Ehcache you can define one or many server groups. Each group consists of a cluster of servers. The cluster has to decide which node is going to be the active server. This is decided by having an election where a node is selected to be the active server. The rest of the nodes in the cluster are waiting in passive standby ready to take over if the active node fails.
In order to actually detect when a failover needs to happen, we wrote a configurable HC (Health checker). Our HC detects errors that won't show up as a normal network disconnect or failure, such as a network cable being pulled. Because Enterprise Ehcache is written in Java, we also had to deal with long GCs. So we designed our HC to detect long GCs as well.
Based on your use case, you may want to change the HC settings depending on what your tolerance is for network disruption and long GCs. Before you go about changing your settings, you might want to check out these files:
$TERRACOTTA_KIT/platform/config-samples/tc-config-healthchecker-aggressive.xml
$TERRACOTTA_KIT/platform/config-samples/tc-config-healthchecker-aggressive.xmltc-config-healthchecker-development.xml
$TERRACOTTA_KIT/platform/config-samples/tc-config-healthchecker-aggressive.xmltc-config-healthchecker-production.xml
Depending on what you're doing and what your requirements are, picking one of the settings above should be suffice.
Now let's discuss these properties:
l2.healthcheck.l2.ping.idletime=3000
l2.healthcheck.l2.ping.interval=1000
l2.healthcheck.l2.ping.probes=2
l2.healthcheck.l2.socketConnectTimeout=5
l2.healthcheck.l2.socketConnectCount=2
Above are the properties you have to work with for HC. The HC starts off using the ping.idletime. This is the maximum amount of time that can elapse between the last time data was received from the corresponding node. In this case the idletime is 3000 milliseconds, after which the HC notes "Hey, didn't receive any data from the corresponding node, I should check on that node."
To check on the health of the node, it tries to ping the node in intervals, in which the interval length is defined by ping.interval. You can push this number down to get more granularity. If the corresponding node doesn't respond within the ping.interval then it either tries to probe again because the ping.probes countdown hasn't completed, or it checks socketConnectCount and see if its allowed to make any more socket connections. If not, it declares the corresponding node DEAD.
In the example above, since the socketConnectCount is set to 2, it will try to make another socket connection. If it cannot make the socket connection within (socketConnectTimeout * pingInterval) ms, then it will declare the node DEAD. In our example, the interval length 5000 ms. Once it established a connection it will repeat the ping probe cycle again.
The maximum time it will take the HC to detect a network disruption is ( ping.idletime + (ping.probes * ping.interval) + (socketConnectTimeout * ping.interval) ) ms. If the problem is longGC, then the connection will happen, but the pings won't receive a response. The maximum time HC takes to detect a long GC is ( socketConnectCount * ( ping.idletime + (ping.probes * ping.interval) + (socketConnectTimeout * ping.interval) ) ) ms.
If you have a short tolerance for network disruption, but your ok with having lengthy long GCs, then you can decrease the ping.idletime and increase the socketConnectCount; you tune based on your tolerances. Here's some detailed documentation on the HA settings.
With BigMemory in our server FORGET ALL THAT IS WRITTEN ABOVE.
Our HC has all these different properties because we had to be tolerant of Long GCs. When a node is in long GC, it will make a socket connection but not be able to complete the ping probe cycle. but now with BigMemory you probably don't ever need to change these settings, unless you have people tripping over your network cables.
Unlike Long GCs, network disruptions is something you probably know about and its easier to guess what that tolerance should be. Not having to tune for long GCs makes HC configuration simple. You only need to tune for YOUR own environment (i.e. crappy network, or clumsy workers) and not for something that is specific to Java (Long GCs).
Imagine what it can do for you. Check out our beta here.
Labels:
bigmemory,
caching,
ehcache,
high availability,
Java,
nosql,
Terracotta
Sunday, October 17, 2010
BigMemory: Followup Q and A
I got quite a few responses to my post on BigMemory in the Terracotta Server. It seems like people are quite confused on what it actually is.
Here's some answers to a few questions I received:
1. Why can't they (Terracotta) put garbage collector on another cpu core and gain performance?
I think there is a misunderstanding about the cost of Garbage Collection. The Full GC pause (which is when all application threads are paused) is what the GC problem in Java is all about. It is tolerable when your Heap is 1-2 GB. But anything beyond that you get 4,5,8 seconds GC pauses. Besides, if you don't run ParallelGC then it will use one core anyway. But you DO want to have your garbage collector using all the cores so it will complete faster and have less pauses.
2. (In References to the question above) Then put it on another thread and how about pausing one thread at a time ?
Again this is not possible AFAIK to do with the Sun/Oracle JVM. Also, Full GC Pauses are a necessary evil for the GC algorithm they are using. Even if this was possible, it would not solve the problem of unpredictability.
3. I can't believe there are no GC pauses ... or you guys might have made memory management solution like an OS in java.
The idea of have direct memory allocation in Java is no big secret. There is an -XX:MaxDirectMemorySize flag to tell the JVM how much direct memory to allocate. The value add of Terracotta is to use this direct memory space in a way that is fast and does got fragment.
4. Using direct memory allocated by the JVM is useless because it is so much slower then the Heap.
Access to direct memory is NOT slower than Heap. There are two things that contribute to the perceived slowness of direct memory. Serializing and deserializing data to and from direct memory; and allocating and cleaning up direct memory buffers. At Terracotta we solved the direct memory and cleanup problem. On the Terracotta Server we don't pay for the serialization/deserialization cost. On Enterprise Ehcache (unclustered) we do pay a serialization/deserialization cost, but compare this CPU cost to having to deal with Full GC Pauses on the Heap. The tradeoff is well worth it. Besides BigMemory using the Heap as part tier storage strategy; Heap to OffHeap to Disk. It's an age old principle in computer science (think Virtual Memory). We avoid the serialization/deserialization cost for frequently used objects by having those in Heap, then having a big part of your cache on OffHeap to avoid long FullGC and the rest spilling over to disk.
For the additional CPU cost what you get in return is predictable latency and speed with all the memory your Java process desires. Find an app where you do see Full GC pauses and checkout the beta to see for yourself.
Here's some answers to a few questions I received:
1. Why can't they (Terracotta) put garbage collector on another cpu core and gain performance?
I think there is a misunderstanding about the cost of Garbage Collection. The Full GC pause (which is when all application threads are paused) is what the GC problem in Java is all about. It is tolerable when your Heap is 1-2 GB. But anything beyond that you get 4,5,8 seconds GC pauses. Besides, if you don't run ParallelGC then it will use one core anyway. But you DO want to have your garbage collector using all the cores so it will complete faster and have less pauses.
2. (In References to the question above) Then put it on another thread and how about pausing one thread at a time ?
Again this is not possible AFAIK to do with the Sun/Oracle JVM. Also, Full GC Pauses are a necessary evil for the GC algorithm they are using. Even if this was possible, it would not solve the problem of unpredictability.
3. I can't believe there are no GC pauses ... or you guys might have made memory management solution like an OS in java.
The idea of have direct memory allocation in Java is no big secret. There is an -XX:MaxDirectMemorySize flag to tell the JVM how much direct memory to allocate. The value add of Terracotta is to use this direct memory space in a way that is fast and does got fragment.
4. Using direct memory allocated by the JVM is useless because it is so much slower then the Heap.
Access to direct memory is NOT slower than Heap. There are two things that contribute to the perceived slowness of direct memory. Serializing and deserializing data to and from direct memory; and allocating and cleaning up direct memory buffers. At Terracotta we solved the direct memory and cleanup problem. On the Terracotta Server we don't pay for the serialization/deserialization cost. On Enterprise Ehcache (unclustered) we do pay a serialization/deserialization cost, but compare this CPU cost to having to deal with Full GC Pauses on the Heap. The tradeoff is well worth it. Besides BigMemory using the Heap as part tier storage strategy; Heap to OffHeap to Disk. It's an age old principle in computer science (think Virtual Memory). We avoid the serialization/deserialization cost for frequently used objects by having those in Heap, then having a big part of your cache on OffHeap to avoid long FullGC and the rest spilling over to disk.
For the additional CPU cost what you get in return is predictable latency and speed with all the memory your Java process desires. Find an app where you do see Full GC pauses and checkout the beta to see for yourself.
Thursday, October 14, 2010
OmmWriter Plug
Just wanted to plug this fantastic software for the Mac. OmmWriter. Writing is a painful excercise for me since I get distracted pretty easily. With so many application like IDEs, chat, mail, brower, twitter etc. open its really hard to concentrate. When I try to write about any topic, I always have a tendency to fact check every sentence, which takes me on reading tangents. OmmWriter runs in full-screen mode and removes any sort of notification. I can write with purpose and bang out a blog post really quickly. Give these guys a try, their version 1 is free and the newest one is a paid version.
Wednesday, October 13, 2010
Terracotta BigMemory: A tale of eating our own code
Terracotta (The company I work for) recently came out with a beta release for their BigMemory product. Our claim is that if you use Ehcache BigMemory then those GC problems go away. If you don't know what Ehcache or Terracotta is you can find out here.
This is far from just a claim, we didn't simply didn't write BigMemory, ran a few tests, and release the Beta. We took BigMemory and freed our own Terracotta Server for GC constraints that most Java server product have.
So what exactly did we do? In Terracotta we keep a representation of distributed objects in your server. For our clustered Ehcache product we keep a representation of cache segments and cache entries on the server. As the cache grows, we need to keep track of the keys associated with the cache segment, as well as have more cache entries representations on the server. The Terracotta Server has the ability to flush cache entries to disk when we detect memory pressure on our server. and fault in objects when cache entries are needed.
To accommodate the keys on the server, you had two options. One is the increase heap so that more keys could fit on the server. The second is to add additional Terracotta Servers to the cluster so that the segments (and their keys) are distributed to among many servers.
Adding more heap threw us into the classic GC problem. Once you start getting into heap sizes of 5-6GB, you start seeing 5-8 second GC pauses. Adding stripes solves the problem and worked for our customers as well.
But we found a pattern emerging. Customers were purchasing huge boxes (I'm going regret saying huge a few months from now ), 32 GB ram and 16 cores of processing power. They were like "We want to run your servers on this thing." And they did what everyone does for Java Servers, they basically run multiple JVMs (if possible) on the same box and deal with the added complexity and unpredicatability and probably says a prayer or two. I'm sure you all understand what I mean by complexity, but what do I mean by unpredictability. Lets say you run your servers with small heaps and your getting a 2 second GC, but will so many JVMs, how do you know your GCs will not be staggered. Meaning if you had 16 processes running and they GCed one after the other. That is a 32 second pause. Get the picture?
We figured, there must be a better way. So we built BigMemory and put it in our server. Now you can take our pure Java Terracotta Server, have it use that 32 GB ram and still get less than 1 second pauses. Practically No GC.
For anyone who spent countless hours like I have tuning GC is going to love this. We ran and messed around with every knob the Sun (now Oracle) gave use to tune GC. When we put in BigMemory, we just deleted all those settings. Our heap is small enough so that default Java settings are good enough!
End result? Our Terracotta Server with BigMemory enabled can achieve higher density will less servers and best of all you can get predictable latency out of them.
Check out BigMemory and let me know if it works just as well for you.
Labels:
big memory,
caching,
ehcache,
elastic caching,
heap,
NoGC,
nosql,
Terracotta
Wednesday, February 17, 2010
automagically cluster Lift web sessions with Terracotta
Lift Web Framework Is a MVC Framework written in Scala, Which are two things I know exactly nothing about. Being that, I thought It would be a good example to show how easy it is to cluster web session using terracotta new express web session product. Here's a link to our Beta ...
Now you got yourself a fully functioning hello world webapp. Let's add some code where we are actually saving and displaying session data.
Now start our terracotta server, download here
Start are terracotta server and then run the jetty servers again. this time you can see session data on entered in on one server appear on the other.
Part 1: The Lift/Scala Thing...
So first I had to figure out what the deal with Lift is about. Good thing there's a fairly simple tutorial to create a basic HelloWorld lift webapp here. First let's download and install all the bits you need to write a Lift web app, fortunately for me it was just the Scala Eclipse Plugin.
Let us create a maven project:
mvn archetype:generate -U \
-DarchetypeGroupId=net.liftweb \
-DarchetypeArtifactId=lift-archetype-blank \
-DarchetypeVersion=1.0 \
-DremoteRepositories=http://scala-tools.org/repo-releases \
-DgroupId=com.examples.terracotta \
-DartifactId=clusteredCounter \
-Dversion=1.0-SNAPSHOT
cd clusteredCounter
mvn jetty:run
Now you got yourself a fully functioning hello world webapp. Let's add some code where we are actually saving and displaying session data.
add a Lift Snippet that would process url parameters add then to the session and then display all the session attributes:
package com.examples.terracotta.snippet
import javax.servlet.http._
import net.liftweb.http._
import net.liftweb.util._
import net.liftweb.util.Helpers._
import net.liftweb.http.SessionVar
import scala.collection.mutable.HashMap
import scala.xml._
import java.util.Enumeration
import scala.collection.mutable.HashSet
class ViewSubmission {
def showStuff : NodeSeq = {
var title = S.param("title").openOr("")
var url = S.param("url").openOr("")
S.servletSession.get.setAttribute(title, url)
var e = S.servletSession.get.getAttributeNames
val names = new HashSet[String]
while (e.hasMoreElements()) {
val name = e.nextElement().asInstanceOf[String]
names += name
println(names)
}
<table> {
for (name <- names) yield
<tr>
<td>Title</td>
<td>{name}</td>
</tr>
<tr>
<td>Url</td>
<td>{S.servletSession.get.getAttribute(name)}</td>
</tr>
}</table>
}
}
Edit the index.html to show edit and show session data:
<lift:surround with="default" at="content">
<table>
<lift:snippet type="viewSubmission:showStuff" />
</table>
<form>
<tr>
<td>Title</td>
<td>
<input type="text" name="title" />
</td>
</tr>
<tr>
<td>Url</td>
<td>
<input type="text" name="url"/>
</td>
</tr>
<tr>
<td> </td>
<td><input type="submit" value="Add" /></td>
</tr>
</form>
</lift:surround>
Run mvn -Djetty.port=9999 jetty:run and another instance with
mvn -Djetty.port=9999 jetty:run. got to http://localhost:8888 and http://localhost:9999. you can only see the data you added to each local web session.
Now the fun part...
Part 2: Clustering the web session
First we need to add our express web session jar to the pom.xml as a dependency:
<dependency>
<groupid>org.terracotta.session</groupid>
<artifactid>terracotta-session</artifactid>
<version>1.1.0-SNAPSHOT</version>
</dependency>
and also Terracotta Maven Repo information..
<repository>
<id>terracotta-snapshots</id>
<url>http://www.terracotta.org/download/reflector/maven2</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
Let us add our terracotta filter for clustering:
<filter>
<filter-name>terracotta</filter-name>
<display-name>Terracotta Session Filter</display-name>
<!-- The filter class is specific to the application server. -->
<filter-class>org.terracotta.session.TerracottaJetty61xSessionFilter</filter-class>
<init-param>
<param-name>tcConfigUrl</param-name>
<!--
<param-value> of type tcConfigUrl has a <param-value> containing the
URL or filepath (for example, /lib/tc-config.xml) to tc-config.xml.
-->
<param-value>localhost:9510</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>terracotta</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
Now start our terracotta server, download here
Start are terracotta server and then run the jetty servers again. this time you can see session data on entered in on one server appear on the other.
Subscribe to:
Posts (Atom)