Java bytecode, string concatenation and StringBuilder

In my earlier post I was making a fuss over picking the faster hash algorithm, and then I realised I was using + to concatenate strings.
Should I always use a StringBuilder? Should I care even for small strings? Heck, if I use the StringBuilder I’ll surely create one extra object anyway…

I tried some variations of the test and I did not find any performance difference when comparing simple concatenation to using the string builder. I even tried bigger strings and other combinations. Still No difference.

That got me curious, so I wrote a very simple class and looked at it in the bytecode outline:

This java code:

public static void main(String[] args) {
	String cip = "cip";
	String ciop = "ciop";
	String plus = cip + ciop;
	String build = new StringBuilder(cip).append(ciop).toString();
}

Generates this bytecode (see how the two concatenation styles generate the very same code):

  L0
    LINENUMBER 23 L0
    LDC "cip"
    ASTORE 1
   L1
    LINENUMBER 24 L1
    LDC "ciop"
    ASTORE 2
// cip + ciop
   L2
    LINENUMBER 25 L2

    NEW java/lang/StringBuilder
    DUP
    ALOAD 1
    INVOKESTATIC java/lang/String.valueOf(Ljava/lang/Object;)Ljava/lang/String;
    INVOKESPECIAL java/lang/StringBuilder.(Ljava/lang/String;)V
    ALOAD 2
    INVOKEVIRTUAL java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    INVOKEVIRTUAL java/lang/StringBuilder.toString()Ljava/lang/String;

    ASTORE 3
// new StringBuilder(cip).append(ciop).toString()
   L3
    LINENUMBER 26 L3

    NEW java/lang/StringBuilder
    DUP
    ALOAD 1
    INVOKESPECIAL java/lang/StringBuilder.(Ljava/lang/String;)V
    ALOAD 2
    INVOKEVIRTUAL java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    INVOKEVIRTUAL java/lang/StringBuilder.toString()Ljava/lang/String;

    ASTORE 4
   L4
    LINENUMBER 27 L4
    RETURN

The compiler has transformed “cip+ciop” into “new StringBuilder(cip).append(ciop).toString()“.
In other words using “+” is a shorthand for the more verbose StringBuilder idiom.

The compiler will do same trick for cip + "ciop" and "cip" + ciop. (In case you wonder, "cip" + "ciop" will just be compiled as "cipciop").

This is great, but beware, the compiler will not always do the best thing for you:

This code

String big = "both";
big += cip;
big += ciop;

Will be compiled into this:

String big = "both";
big = new StringBuilder(bag).append(cip).toString();
big = new StringBuilder(bag).append(ciop).toString();

While of course the most efficient way is

String big = new StringBuilder("both").append(cip).append(ciop).toString()

Now of course nobody in his right mind would ever write any of the above (or use those variable names), but here is a pattern that you may have seen before:

String boo = "both";
for (int i=1; i<100; i++)
     boo += cip + ciop;

Now the compiler will do the obvious thing and instantiate one new StringBuilder at each iteration:

String boo = "both";
for (int i=1; i<100; i++)
     boo += new StringBuilder(boo).append(cip).append(ciop).toString();

In this case it is best to use this idiom:

StringBuilder foo = new StringBuilder("both");
for (int i=1; i<2; i++)
    foo.append(cip).append(ciop);
String boo = foo.toString();

Enjoy :)

This entry was posted in java and tagged , , , . Bookmark the permalink.
  • Anergy

    StringBuffer is wrong probably in second last code sample.

  • Anonymous

    Anergy,
    you’re right, should be a StringBuilder. Thanks for spotting it.

    -teo

  • Christian Ullenboom

    Too add 2 ideas: String has a contact() method which people often forget. And StringBuilder/StringBuffer has a constructor to initialize with a size for the internal buffer to prevent the char array from resizing if the size of segments are known.

  • Anonymous

    Hi Christian,
    I’ve updated the post with your observation.

    Thanks.

  • robertmarkbram

    Nice post Matteo,

    The Javadoc for String does say that  + is  implemented through the StringBuffer. I wonder if other JVMs do this too (IBM, Google)?

    Also, I wonder if the above code gets optimised at all after a few hundred runs? (If there even is any optimisation possible.)

    Rob :)

  • Anonymous

    Hey Robert,
    interesting observation:
    Javadoc for java 1.4 says “String concatenation is implemented through
    the StringBuffer class and its append method”
    (http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html)
    While in java 1.5 it’s “String concatenation is implemented through
    the StringBuilder(or StringBuffer) class and its append method” (I’ve
    updated the post)

    It actually makes sense, as StringBuilder has been introduced as a
    faster (non thread-safe) version of StringBuffer.

    To answer your question, i think it’s likely the JIT compiler finds
    an opportunity for optimization in a concat loop, but I have no idea
    on how to go at testing this, other than disabling the JIT and
    comparing the results. Seems to be good material for an other post!

    Thanks!

  • forax

    The is a special VM optimization in JDK7 that recognizes StringBuilder/Buffer pattern and optimize it in order to create only one String if possible (see -XX:+OptimizeStringConcat)

  • Anonymous

    Forax,
    apparently the option has been there since  Java 6 Update 20. It would be interesting to see if the optimization happens during compile or JIT…

    EDIT: javac does not accept that option, so it’s sure a JIT optimization

    http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

  • Jan Burse

    Strange that the compiler generates a valueOf(), it should now that the object has type string.