Thursday, June 28, 2012

Developer Contest - Delhi

Contest Link

The contest will test your skills on some of the technologies covered at the upcoming IndicThreads Conference On Software Development to be held on 13-14 July 2012 in Delhi, India. 

Tuesday, June 19, 2012

One of the major benefits of the Java language in comparison with other object oriented languages (like C++) is that programmers do not have to handle memory allocation during execution of the program. It is totally delegated to the garbage collector (GC) which is in charge of removing unused objects to release memory.

JVM memory organisation

The memory of the JVM is first slitted into 2 parts :
  1. the heap memory
  2. the non heap memory

The heap memory

The heap is devided into two generations: The Young Generation and the Tenured Generation.

The heap contains all Objects created during execution, this space should be frequently visited by the GC in order to clean up unused objects (=objects not referenced anymore). It is also slitted into different section according to the lifetime of the objects. Initially all objects are stored into the Eden section. Most of this objects will then be destroyed at the next GC visit because they are not used anymore. But some of them need to be kept because they have a longer lifetime and they will be used in the future. Therefore they are moved into the Survivor bucket. In the Survivor bucket GC calls are less frequent than in the Eden bucket. This two buckets represent the Young generation and contains all the “newly created” objects.

If objects stored in Survivor buckets, survive to other GC visits they are then moved into the Tenured generation (or Old generation) bucket until they are destroyed by the Garbage Collector.
The non heap memory
It comprises of ‘Method Area’ and other memory required for internal processing.
  1. Method Area
    • The method area is responsible for storing class information. The Class-Loader will load the bytecode of a class and will pass it to the JVM. The JVM will generate an internal class representation of the bytecode and store it in the method area. The internal representation of a class will have the following data areas:
      • Runtime Constant Pool -  Numeric constants of the class of types int, long, float or double, String-constants and symbolic references to all methods, attributes and types of this class.
      • Method Code - The implementation (code) of all methods of this class including constructors etc.
      • Attributes -  A list of all named attributes of this class.
      • FieldsValues of all fields of this class as references to the Runtime Constant Pool
  2. Java Stacks or Frames
    • Every time a method is invoked a stack frame is created and pushed onto the Java Stack. When the method terminates, this frame is popped off the stack. Thus, the top-most frame of the Java stack always belongs to the method that is currently being executed by the JVM.
In addition to the heap and method area, that are available for all threads of a JVM, every thread also has exclusive access to memory that is created for each thread:
  • PC Register The Program Counter register. The register points to the current JVM instruction of the method the thread is executing, if the method is not a native method. If it is a native method the content of the PC register is not defined.
  • Java Virtual Machine Stack Each thread gets its own stack on which so called Frames are pushed for each method the thread currently executed. This means that there can be many frames on the stack for nested method calls – but there is only one frame active at the same time for one thread. The frame contains the local variables of the method, a reference to the Runtime Constant Pool of the method’s class and an operand stack for the execution of JVM operations. (The JVM is a stack machine!)
  • Native Methode Stack Native methods get its own stack, the so called „C-Stack“.

Structure of JVM


Bad tuning of the JVM can lead to several issues. The most obvious is that you set memory parameters too low to run your application so it cannot even start. But there are others.


If your memory parameters are too low you will probably have to tackle OutOfMemoryError since your application requires more space than what you allocated.

Non optimal performance

If your memory parameters are too low you will decrease the performance of the application. Indeed, the GC has to run very often in order to maintain the available space, which requires some time and also needs to stop the application.

Excessive footprint

On the other hand, allocating a huge memory to the JVM is also a bad practice because you will use (consume) much more than what you need and also because other resources won’t be able to use this unused space : OS, database, …. Therefore this may lead to non optimal performance as well.

Monday, June 18, 2012

XA and non XA Transactions

Came across this wonderful explanation by Mike regarding XA transactions here

An XA transaction, in the most general terms, is a "global transaction" that may span multiple resources. A non-XA transaction always involves just one resource. An XA transaction involves a coordinating transaction manager, with one or more databases (or other resources, like JMS) all involved in a single global transaction. Non-XA transactions have no transaction coordinator, and a single resource is doing all its transaction work itself (this is sometimes called local transactions).

XA transactions come from the X/Open group specification on distributed, global transactions. JTA includes the X/Open XA spec, in modified form. Most stuff in the world is non-XA - a Servlet or EJB or plain old JDBC in a Java application talking to a single database. XA gets involved when you want to work with multiple resources - 2 or more databases, a database and a JMS connection, all of those plus maybe a JCA resource - all in a single transaction. In this scenario, you'll have an app server like Websphere or Weblogic or JBoss acting as the Transaction Manager, and your various resources (Oracle, Sybase, IBM MQ JMS, SAP, whatever) acting as transaction resources. Your code can then update/delete/publish/whatever across the many resources. When you say "commit", the results are commited across all of the resources. When you say "rollback", _everything_ is rolled back across all resources.

The Transaction Manager coordinates all of this through a protocol called Two Phase Commit (2PC). This protocol also has to be supported by the individual resources. In terms of datasources, an XA datasource is a data source that can participate in an XA global transaction. A non-XA datasource generally can't participate in a global transaction (sort of - some people implement what's called a "last participant" optimization that can let you do this for exactly one non-XA item). 

Tuesday, June 12, 2012

Memory usage of Java Strings

If you're used to using a language such as C and are not used to dealing with Unicode, you may expect a string to essentially take up one byte per character plus a single byte terminator. But in a language such as Java, gone are those days:

  • every object has at least 8 bytes of housekeeping data, and arrays 12 bytes, and will be padded to a multiple of 16 bytes (in 32-bit versions of Hotspot);
  • a Java String actually consists of more than one object;
  • Java char takes up two bytes, even if you're using them to store boring old ASCII values that would fit into a single byte;
  • a Java String contains some extra variables that you might not have considered.

How to calculate String memory usage

For reasons we'll explore below, the minimum memory usage of a Java String is generally as follows:

Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 38) / 8)

Or put another way:
  • multiply the number of characters of the String by two;
  • add 38;
  • if the result is not a multiple of 8, round up to the next multiple of 8;
  • the result is generally the minimum number of bytes taken up on the heap by the String.

Understanding String memory usage

To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:
  • char array— thus a separate object— containing the actual characters;
  • an integer offset into the array at which the string starts;
  • the length of the string;
  • another int for the cached calculation of the hash code.
This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far). Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.
If the string contains, say, 17 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 17*2=34 bytes for the seventeen chars. Since 12+34=46 isn't a multiple of 8, we also need to round up to the next multiple of 8 (48). So overall, our 17-character String will use up 48+24 = 72 bytes. As you can see, that's quite a long way off the 18 bytes that you might have expected if you were used to C programming in the "good old days"1.

Memory usage of Java objects

General formula for calculating memory usage

In general, the heap memory used by a Java object in Hotspot consists of:
  • an object header, consisting of a few bytes of "housekeeping" information;
  • memory for primitive fields, according to their size ;
  • memory for reference fields (4 bytes each);
  • padding: potentially a few "wasted" unused bytes after the object data, to make every object start at an address that is a convenient multiple of bytes and reduce the number of bits required to represent a pointer to an object.

    Java typeBytes required

    • a normal object requires 8 bytes of "housekeeping" space;
    • arrays require 12 bytes (the same as a normal object, plus 4 bytes for the array length).

Object size granularity

In Hotspot, every object occupies a number of bytes that is a multiple of 8. If the number of bytes required by an object for its header and fields is not a multiple 8, then you round up to the next multiple of 8.
This means, for example, that:
  • a bare Object takes up 8 bytes;
  • an instance of a class with a single boolean field takes up 16 bytes: 8 bytes of header, 1 byte for the boolean and 7 bytes of "padding" to make the size up to a multiple of 8;
  • an instance with eight boolean fields will also take up 16 bytes: 8 for the header, 8 for the booleans; since this is already a multiple of 8, no padding is needed;
  • an object with a two long fields, three int fields and a boolean will take up:
    • 8 bytes for the header;
    • 16 bytes for the 2 longs (8 each);
    • 12 bytes for the 3 ints (4 each);
    • 1 byte for the boolean;
    • a further 3 bytes of padding, to round the total up from 37 to 40, a multiple of 8.