I enjoy exploring problems as described in my previous post about JVMs and large heaps. Unfortunately, I was left unsatisified guessing in a vacuum, since the team didn’t have resources to verify the hypotheses right away. So on a lark, I took a couple hours to explore it using artificial tests.
I slapped together this simple app to allocate a fixed amount of heap.
import java.util.*; public class AllocateAndHold { public static void main(String[] args) throws Exception { if (args.length < 1) { System.err.println("Specify the approximate amount of heap to use up, in MB."); } int numObjects = Integer.parseInt(args[0] ) * 1000000 / 16; ArrayList objectList = new ArrayList(); for (int i = 0; i < numObjects; i++ ) { objectList.add(new Object()); } System.out.println("Press Enter to exit."); System.in.read(); } }
I created minor variations that would allow the user to make requests to System.gc() on each keypress, and a version that let go of the allocated objects before waiting for input.
I only used GC arguments for one scenario, which I swiped from the team that had been having problems. Otherwise, I only varied the -Xmx switch to vary the max heap.
-server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:-CMSParallelRemarkEnabled -XX:NewRatio=3 -XX:NewSize=500m -XX:MaxNewSize=500m -XX:SurvivorRatio=2 -XX:TargetSurvivorRatio=50 -XX:CMSInitiatingOccupancyFraction=60 -XX:CMSFullGCsBeforeCompaction=1 -Dsun.rmi.dgc.client.gcInterval=300000 -Dsun.rmi.dgc.server.gcInterval=300000
I only have a workstation class machine (P4-2.8G, 3GB) so I scaled the problem back to allocating about 1.5GB of heap, with 2GB max. I observed:
- Using the regular GC configuration had OS-reported memory usage rise normally to 1.5GB and idle at 0% CPU.
- Holding onto 1.5GB of objects takes about 4 seconds to do a full GC sweep, when triggered via
System.gc(). I only eyeballed the time watching process CPU usage in top though.
- As expected, releasing the objects for garbage collection and calling
System.gc() did not cause the top reported resident memory usage to go down. Only when I exhausted the remaining free physical memory (with additional instances of the test program) did the reported memory of the first process begin to drop. The OS is simply putting excess memory to use.
- Using the cacophony of GC switches outlined above seemed to push the OS reported memory usage to the maximum, i.e. the process actually reported using 2GB of memory, even though only 1.5GB of objects are being allocated.
- Using the complex GC configuration, the reported memory usage seemed to rise faster over the course of the execution (subjective observation), hitting the maximum 2GB halfway through.
- Again with the complex GC configuration, total execution time seemed longer.
-
The complex GC configuration also seemed to allow for odd JVM stderr messages about having to “give up” and resort to “foreground collection”. I also occasionally got OutOfMemory errors.
- Using just the GC switches that enable asynchronous garbage collection, the “idle” CPU usage of the process was a steady 5%.
I am biased, of course, as I went in with expectations of what I would fine. However, it does seem clear to me that the following are supported/verified:
- There is a GC penalty for just holding a large number of objects around (the “mark” portion of mark-and-sweep?).
- Using OS memory reporting tools naively to check memory usage is not useful. Using the memory usage as reported by
Runtime.freeMemory() et al. is probably more appropriate for most situations.
- Using asynchronous garbage collection may give a smoother overall “ride”, but at the expense of a certain amount of constant CPU overhead.
To explain the rest of the behaviour, I am still enamored of the idea that it is the memory compaction issue. Specifically, my hypothesis is:
- On my poor single CPU box, a hot loop of objects being allocated starved the async. GC thread of the CPU cycles it needed to collect garbage appropriately.
- Using async GC favours expanding the heap over compaction; i.e. a JVM with async GC makes less efficient use of the memory it already had, preferring to simply get more before seeing if it can make do with what it has. It also might not be the async GC, but all the other GC parameters being changed.
- The (subjectively observed) longer execution time is perhaps because compaction is an expensive operation with a big heap. Whereas the standard GC tries to keep the heap small (and thus does a few compactions against a small heap), the configured GC keeps avoiding compaction until it must be done (i.e. at max heap), at which point it’s perhaps doing less compactions but against a much larger heap.
If I were being thorough about this, I’d do the tests more objectively and try it against the IBM JVM, which has the incremental compaction capability. Still, just putzing around is interesting and seems to yield useful information.