Friday, October 31, 2008

Back to Basics

When doing a little refactoring on terracotta server code, I was reminded on why the basics ALWAYS matters.

The server side representation of both ArrayList and LinkedList mapped to a class called ListManagedObjectState. The underlying data structure for this class was a ArrayList. We got around to mapping LinkedList to a LinkedListManagedObjectState where the underlying data structure for that class is a LinkedList.

I wrote a little test and the following where the results:


ArrayList.add for 50000 objects took 424 ms.
LinkedList.add for 50000 objects took 198 ms.
ArrayList.addFirst for 50000 objects took 906 ms.
LinkedList.addFirst for 50000 objects took 109 ms.
ArrayList.addAt for 50000 took objects 105 ms.
LinkedList.addAt for 50000 took objects 132 ms.
ArrayList.clear for 50000 took objects 127 ms.
LinkedList.clear for 50000 took objects 49 ms.
ArrayList.removeFirst for 50000 took objects 847 ms.
LinkedList.removeFirst for 50000 took objects 70 ms.
ArrayList.remove for 50000 took objects 865 ms.
LinkedList.remove for 50000 took objects 68 ms.


If your mutating a lot more then accessing, then LinkedList is the way to go.
Basics do matter!

Thursday, October 30, 2008

DSO Garbage Collection: YoungGen

There are certain apps when clustered using Terracotta that generates short-lived DSO objects and lots of them. Which results in a lot of DSO garbage!

Taking the existing DSO garbage collector and running it more often will collect these short-lived objects, but the MARK stage could take a long time. For example, you'll have a Terracotta server array that has DSO objects on the server's cache as well as DSO objects faulted to disk. When the DSO garbage collector runs, it will fault in objects from the disk during the MARK stage. This may take awhile to run.

The DSO objects that are short-lived are probably in the server's cache. This is where YoungGen comes in. It's the same algorithm, except for the set of objects we are considering as GC Candidates. Instead of considering all the objects in the system and removing objects that are unreachable. We only consider the objects on the server's cache and remove objects that we can definitely determine as garbage.

DSO objects that cannot be collected in YoungGen is the following:

1. Roots
2. DSO Objects that in the server's cache that is being references by DSO Objects faulted to disk. This is marked in a subset called RememberMe set
3. Objects that are in the cache, but never been flushed to disk (i.e. really new objects).

enable the following properties and see if YoungGen DSO Garbage Collection works for you:


l2.objectmanager.dgc.young.enabled = true
l2.objectmanager.dgc.young.frequencyInMillis = 180000

Wednesday, October 29, 2008

Terracotta DSO GC Algorithm in a nutshell

There maybe a few of you out there (when I mean few, hopefully a few million :)) that maybe interested in how Terracotta collects its Distributed Objects that is no longer in use. Ok here goes...

Objects that should be collected by DSO Garbage Collection are objects which are no longer referenced by any client (meaning the object has already been GCed locally on every client that held it). So now that the clients have no more use for this object, its up to the server array to dispose of it (i.e. remove it from the server array cache or terraocotta's persistent storage).

When DGC kicks in..

1. It starts monitoring for new objects being created and keep a set of those. This is used later to rescue GC candidates.

2. It does its initial mark, which is to get all the ObjectIDs in the system. And also our roots in the system. We remove the roots and well as object reachable by roots form GC Candidates set. The objects that are left are objects that are no longer referenced by anyone

3. Remember when monitoring new object references was turned on? Now new referenced objects that appeared in our GC Candidate set can be removed from that set. This means there were new references to those objects while our first marking was going on, so they need to be rescued and not collected anymore. If after this rescue there are no GC candidates, then GC is complete.

4. If there are still GC Candidates left, then the system is paused. For us this means pausing our ObjectManager so that new lookup or create requests
goes pending (i.e no new references are created). We now do another rescue and produce our final GC Candidate set for this GC Cycle. Then new reference monitoring is turned off.

5. finally we have a have a set to delete and pass that list onto a different thread that will delete the GC Candidates from the ObjectManager in batches.

Since the delete is staged on a different thread (SEDA arch). It is possible for another GC Cycle to kick in before all the delete request complete, which can in turn create more objects to delete and life goes on...

Up next: Young GC Candidates....