Tuesday, June 12, 2012

Memory usage of Java Strings

If you're used to using a language such as C and are not used to dealing with Unicode, you may expect a string to essentially take up one byte per character plus a single byte terminator. But in a language such as Java, gone are those days:

  • every object has at least 8 bytes of housekeeping data, and arrays 12 bytes, and will be padded to a multiple of 16 bytes (in 32-bit versions of Hotspot);
  • a Java String actually consists of more than one object;
  • Java char takes up two bytes, even if you're using them to store boring old ASCII values that would fit into a single byte;
  • a Java String contains some extra variables that you might not have considered.

How to calculate String memory usage

For reasons we'll explore below, the minimum memory usage of a Java String is generally as follows:

Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 38) / 8)

Or put another way:
  • multiply the number of characters of the String by two;
  • add 38;
  • if the result is not a multiple of 8, round up to the next multiple of 8;
  • the result is generally the minimum number of bytes taken up on the heap by the String.

Understanding String memory usage

To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:
  • char array— thus a separate object— containing the actual characters;
  • an integer offset into the array at which the string starts;
  • the length of the string;
  • another int for the cached calculation of the hash code.
This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far). Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.
If the string contains, say, 17 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 17*2=34 bytes for the seventeen chars. Since 12+34=46 isn't a multiple of 8, we also need to round up to the next multiple of 8 (48). So overall, our 17-character String will use up 48+24 = 72 bytes. As you can see, that's quite a long way off the 18 bytes that you might have expected if you were used to C programming in the "good old days"1.

No comments:

Post a Comment