Wednesday, October 20, 2010

Memory cost (per entry) for various Java data structures

Have you seen this?
It shows the memory cost per each entry for various common and not-so-common Java data structures. For example, have you wondered when you put() an entry to a HashMap, how much memory footprint you create, apart from the space occupied by your key and your value (i.e. only by internal artifacts introduced by HashMap itself)? There are the answers to such questions.

It also exemplifies the utility of my memory measuring tool, which I recently redesigned to decouple object exploration from measuring the memory usage (bytes) of objects, so I can use the same exploration to derive more qualitative statistics, like number of objects in a object graph, number of references, and number of primitives (which would directly reflect memory usage if it weren't for effects of alignment).

If you are designing data structures in Java, it would be silly not to use this tool. Seriously. (Profilers can be used to somewhat the same effect, but why fire it up and click buttons till you get where you want and take numbers and calculate by yourself, when you can do it more precisely with very simple code).

Future work: do the same for the immutable Guava collections.


  1. Hi Dimitris,

    this is a very useful project! Thanks for sharing it!

    BTW it would also be really interesting to create a wiki page that compares the memory footprint of common Java data structures with the equivalents in Scala.

  2. Thanks.

    Well, this would require quite an effort, I would have to redo it in scala, since scala collections would probably be too clumsy to use from java. It should be easy though to try it ad-hoc on any collection. Here is a small attempt, just trying to see the cost of (immutable.)Map with 1 to 32 entries:

  3. The rules for how much an object uses in Java are rather complicated.
    Check for more information. It also depends on your JVM implementation and is subject to change.
    Typically references on 64 bit machines are 64 bit but recently SUN/Oracle added an option to use 32 bit references on 64 bit machines.
    Typically the overhead of 64 bit is only twice for references, which results in an memory usage increase of about 30-50% in practice.

  4. Thanks for the link Markus, though I'm not sure whether you are answering to something of this post.

    I mean that in the above, I apply two tools: one measures memory in bytes (using Instrumentation), the other counts objects/references etc of the object graph, which is invariant across VMs and architectures. So neither depends on getting such rules right.

  5. Μήτσο σήμερα πήρα χαμπάρι ότι την έκανες για τη γη της επαγγελίας και τη Google.
    Συγχαρητήρια και χαιρετίσματα!
    Μιχάλης Πολάκης