Portopolis: JVM Tuning: UseParallelOldGC

The preceding posts prepared us to look at the JVM option -XX:+UseParallelOldGC. This option was introduced in Java 5 update 6 and became default with the throughput collector in Java 7 update 4. We do not specify a garbage collector, so we use the default for Java 6 update 14, which is throughput. (The alternative is CMS.) The default for our version also features UseParallelGC. What this means is that the young generation is collected by multiple parallel threads. But the old generation is collected by a single thread unless UseParallelOldGC is specified on the command line at startup. Marcus immediately recommended we specify -XX:+UseParallelOldGC, and all the available literature we could find, in print and online, agreed. In our own cautious way, we still tested this in our performance lab before putting it into production. Lab tests showed healthy improvement in the key metrics - 24% improvement in Full GC overhead, 29% improvement in average Full GC pause time, and 16% improvement in maximum Full GC pause time. Then, proceeding in our still cautious way, we tried it on one of our ten nodes in production.

Surprise! In production, UseParallelOldGC caused our overhead, our average Full GC pause time, and our maximum Full GC pause time to approximately double! All that caution seems to be worth something. We weren't able to correctly predict what the impact would be in production, but we minimized impact by trying it first on one node, and we gave ourselves the background to learn from the experience.

First of all, why did production behave differently from the performance lab trials? The lab differs from production in some pretty big ways. The lab has a 4GB heap and our LoadRunner license, which is already too expensive, does not generate a production size load. Our lab may be useful for gauging the relative effect of changes in our own code, but it failed to deliver value here.

Second, why did universally recommended practice backfire in our environment? I did find that we are not the only ones ever to experience this sort of surprise. Tony Chiu looked for answers at https://community.jboss.org/message/221725. He didn't get any, but comments tentatively suggest too many parallel threads contending over too small an old generation. So we tried specifying -XX:ParallelGCThreads=2 (default on our server is 4) to reducing contention, but results were even worse. So we wound up reverting to default, which is serial (single-threaded) collection of the old gen.

This is purely hypothetical now, but I suspect UseParallelOldGC requires more head room to operate, and our heap is too lean and hungry (recall the discussion of live data size vs. heap size in the previous post). Organizational realities (politics) prevent us from increasing heap size at this time, but I hope that upcoming changes in our StandardQueryCache will significantly reduce heap size, and then it may be worth trying UseParallelOldGC again.

Portopolis

Sunday, December 2, 2012

JVM Tuning: UseParallelOldGC

No comments:

Post a Comment