Portopolis: November 2012

Now for an overview of the structure of the JVM heap.

The heap is organized into three areas - the young generation, the old (or tenured) generation, and the permanent generation. "Permanent generation" is a bit of a misnomer. It holds metadata such as class definitions. Newly created objects go into the Eden space in the young generation. The young generation also has the survivor spaces, "from" and "to", which we will explain in a moment. When the young generation is too full to accommodate new objects, a minor or partial collection occurs.

The left side of the diagram shows what happens in a healthy partial GC:

Dead objects are marked. (shaded black in diagram)
Live Eden objects move to the To space.
Young enough objects in the From space move to the To space.
Old enough objects in the From space move to the old generation.

The right side shows the heap after the partial GC is complete. Now Eden is empty, and the From and To spaces have traded roles.

Later, we will discuss what happens if there isn't enough room in the To space or in the old generation to accommodate the live objects that belong to them.

We're having a series of consultants come in from Red Hat come in to help us ensure that our production environment is in good shape. One of the things we asked the first one (we'll call him Marcus) to do was to evaluate our JVM options. In retrospect, we should have made a point of better understanding what his areas of expertise were, and limiting our requests accordingly; and he should have deflected requests that did not match his strong points. But live and learn, eh? And I've learned some things about JVM tuning through this process.

Prior to August this year, we ran twelve 8GB (specified by -Xms=8192m -Xmx=8192m) nodes in our main production cluster. While we were in the midst of a series of painful production outages marked by sudden episodes of back to back full garbage collections, Red Hat support recommended we allocate 16GB per node. There was some internal resistance to doing this (it seems an architect no longer with the company - we'll call him Eddie) had said that larger heaps would cause longer full GC pauses), but we went so far as to increase four nodes to 12GB (-Xms=12288m -Xmx=12288m). This was done a few weeks before Marcus came on site. GC logs since then showed that the 12GB nodes performed better than the 8GB nodes. Not only did they have more throughput, but also the GC overhead and the average and maximum pause times were significantly improved - contrary to what Eddie had said.

These graphs summarize GCHisto analysis of production GC logs over one week. (GCHisto, by the way, is a superb free open source application that is simple and fast, and gives a quick easy overview of key statistics from GC logs.) The first four nodes are 12GB heaps. Overhead % is the amount of time paused for GC divided by the amount of time the application ran. Max pause is the number of seconds of the longest pause for a full GC. We see that the 12GB nodes are better and more consistent. The 8GB nodes also seem to be more reactive to stress.

So it was a no brainer to go ahead and move our production cluster to ten 12GB nodes instead of four 12GB and eight 8GB. So far, so good. We didn't need a consultant to see this. But, the way decisions are made in organizations, it can help get things done if an outside "expert" has been paid to tell us what to do.

But we'll be revisiting this question of heap size in a future post. Don't forget that in August, Red Hat support recommended we take this further, and go to 16GB heaps.

Portopolis

Saturday, November 24, 2012

JVM Tuning - Heap Structure

Friday, November 23, 2012

JVM Tuning - Heap Size