IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Java String field optimization tips?
We are developing _many_ server resident components in Java.
Our design pattern uses JDBC - with most selects for a component sharing everything except the terminal 'AND' clause.

I've been placing the common string into a field declared

private final String RETRIEVE_SQL = "select column, column2 "

              + "from table "

              + "where AnyCommonWhereCondition " ;


One of the developers seems to think that changing the string declaration to
private static final String 
will make things dramaticlly faster. I think that that is premature optimization - if we are that concerned whe chould be using StringBuffers anyway.

Any real world experience? [JDK 1.2.2 or 1.3.1, with or without HotSpot]

Dave Levitt
How many CPU cycles, to draw the head of a pin?
New Strings shouldn't matter.
You should be storing queries that occur more than once in PreparedStatements, so they only need to be parsed once.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New JDBC PreparedStatements
The strings are used in the creation of PreparedStatements - oddly enough, we are reconsidering the use of PreparedStatements based on some information in an O'Reilly book on Oracle & JDBC - the sample chapter on the web site shows that [in the autor's tests] PreparedStatement operations are slower than other Statements.
New Depends on the database
I've seen that assertion in regards to SQL Server, but not Oracle.

Link?
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New But back to the main question...
Why are you spending so much time worrying about a few Strings if they only get used once?

Use a performance profiling tool and find your real bottlenecks, instead of guessing.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New Yeeha, Scott
But you wouldn't believe (er, on second thought, maybe you would) the companies that won't spend $500 or $1000 (or whatever it is, I think the max cost I've seen was $3000) on a profiling tool that *really* seeks out inefficiencies.

On second thought, many of the profiling tools I've used under Unix became worthless when compiled with vendor's object-only libraries. cc compiled with profiling (with third-party object libraries) loses itself in untracable links more often than not, in my experience.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New Java profiling usually works pretty well
I'll dig up some links.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New I believe that, no links required
Unless they call native code, Java classes are reasonably exposed to a profiler.

Alas, I've run into C code that ran into the roadblock of vendor libraries. You can get some good information out of cc compiled for gprof, but many times it is only tantalizing as to where it leads you.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New Try: JProbe...
New Oh I believe it
Like the people I've talked to who don't like using generalized functions stored in Postgres. They prefer to hand tune the queries. When you're querying terrabytes of data, maybe. But we were talking about a templating system for a (fairly) low-volume extranet. Take the money you save in developers' time ( ten minutes per query X several dozen pages ) and double the amount of RAM in your webserver.

There are times throwing hardware at it is the solution. Otherwise we'd all be writing in assembly.
We have to fight the terrorists as if there were no rules and preserve our open society as if there were no terrorists. -- [link|http://www.nytimes.com/2001/04/05/opinion/BIO-FRIEDMAN.html|Thomas Friedman]
New Kinda missing the point...
... the point is to optimize the stuff that gets run often. Loops, queries, repeated operations. Not one-time inits, or static vs. non-static strings that are used once as in this case.

The profiler will tell you what's taking the most time in the app. Hit the hot spots with the optimizations and you'll get the most return for your time.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New Real world experience
We used to create CSV files from our application using the string catenation operator. We recently changed them to StringBuffer.append() calls.

Sped up creation of said CSV files from hours to minutes. Literally.
-YendorMike

"The problems of the world cannot possibly be solved by the skeptics or the cynics whose horizons are limited by the obvious realities. We need people who dream of things that never were." - John F. Kennedy
New (The technical reason)
For Java newbies, the technical reason is that when you add two strings together, you're actually creating new String object(s) and destroying others. In bad situations, that can lead to creating and destroying objects like mad. Not intuitive behavior, but that's how String works.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New The technical reason behind the technical reason
Efficiency, believe it or not. Strings are immutable objects. The interpreter knows that a String will never change. As a result, Strings store their internals as a byte array. If you create a new String as a substring of a string, you actually just create an object with an offset and length into the old string's byte array. This is quite fast when you are doing a lot of substrings and the like.

But when you add them together, you end up copying two byte arrays into a third byte array, and creating a new String object from that.

Unfortunately, more people add strings together than take substrings, so you end up with needing something like a StringBuffer, which appends other strings rationally.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New I don't believe it
If substrings are the issue, then you can have your cake and eat it too.

Make every string a structure with a length, offset, and a pointer to a string storage structure. Make the string storage structure have a maximum length and a pointer to the start of the actual string data. The string data is just an array of bytes.

The space is allocated in powers of 2. If you go to append and you still have room, you just insert the data. Otherwise you reallocate room, move the existing string, free the old space and allocate at the new place. This makes incremental appends perform just fine. Taking substrings is just as fast. The overhead is seen from following one extra pointer when finding the string.

This is not entirely dissimilar to what you find in, say, Perl or Ruby. (Neither is exactly like this. though I think that Ruby is closer. A lot closer.) It makes building up large strings incrementally perform just fine. Substrings can be taken efficiently (well not in Perl - but look at how Ruby implements backreferences lazily for instance).

But even so in Ruby the += operator is *still* slow, for exactly the technical reason given above. You are creating lots of new objects. Why? Well, very simple. += has defined semantic effects. You can't get those semantic effects without creating new objects. If you want an efficient string append you have to use the special << operator because the semantic effects are visible. Observe:

# Create 2 strings
init_a = "Hello";
init_b = "Hello";
# Duplicate them.
dup_a = init_a;
dup_b = init_b;
# Append to the dups
dup_a += ", World";
dup_b << ", World";
# What do we have?
puts "init_a is '#{init_a}'"; #-> "Hello"
puts "init_b is '#{init_b}'"; #-> "Hello, World"
puts "dup_a is '#{dup_a}'"; #-> "Hello, World"
puts "dup_b is '#{dup_a}'"; #-> "Hello, World"


And there we see why, even with smart data structures where efficiency is achievable, the += operator still has to be slow.

Cheers,
Ben
New That would help, but....
1. I believe Java tries to conserve memory by merging duplicate strings of class String. So you can easily have two String objects pointing to the same buffer. Edit one of them, and the runtime has to alloc a completely new buffer for it. Even if you're shrinking it.

2. If you alloc in powers of 2, you'll have beacoup wasted space, and the spectre of disk thrashing lurks in the shadows. This isn't a criticism of the basic concept, but some fine tuning is called for. Now if you increment by a fraction, say 5/4, you get a different tradeoff ratio. Less wasted memory, but more frequent allocs.

In fact, it's rumored that some Java implementations do something like this.

But there's no beating the fine control you get with your own StringBuffers. That way, the efficiency is more directly influenced by the ability of the programmer. I say use String for prototyping and for code that doesn't get exercised much, but use StringBuffer for high usage production code. Or else code it in C or C++.
[link|http://www.angelfire.com/ca3/marlowe/index.html|http://www.angelfir...e/index.html]
Sometimes "tolerance" is just a word for not dealing with things.
New The scheme addresses those acceptably well
It is, of course, a compromise between issues. You wind up wasting about a third of your space. (Much less if you just apply the simple heuristic of only creating a string with as little memory as you can, then using the power of 2 trick if someone starts appending to it.) If you want to share objects you either need to do some fancy footwork, or else define semantics in which programmers choose what happens. (Ruby does this, see the code example I gave.)

That algorithmic efficiency and memory usage conflict is no surprise. That is usually true. Scalable algorithms tend to use buffering. Buffering costs memory. Vice versa contortions to avoid using extra memory take excess operations.

As for your suggestion, mine is to not use Java. In other languages you get reasonable default behaviour without having to know language trivia about available types with precise promised tradeoffs. Besides which, string manipulation is not exactly one of Java's strengths.

Cheers,
Ben
New Compile time only
We are not concatinating the strings at runtime - only at compile time.

Hopefully the compiler is smart enough to use a single literal string - literals are supposed to be java.lang.String.intern()'d by the compiler - so only one copy of each should exist - even if multiple copy of the class are constructed.
New Are you sure?
If it's static, I'm almost certain the expression will be evaluated at compile time, or at the latest, evaluated just once when the program starts up. But if you leave out the "static," the "final" may not be enough to do it. The compiler may not be clever enough, and it would end up being evaluated every time you create an instance.

Put in the static keyword. At worst it's redundant. Better safe than sorry.
[link|http://www.angelfire.com/ca3/marlowe/index.html|http://www.angelfir...e/index.html]
Sometimes "tolerance" is just a word for not dealing with things.
New Compile time concatenation
The JDK compiler will generate identical byte code for the following:

private final String x = "ABC";
private final String y = "A" + "B" + "C";

The concatenation is done by the compiler, not at runtime. The compiler is even smart enuf to do the compile time concatenation with a constant variable (i.e. final) if the value of the constant is fixed at compile time. This means that the z string below will produce the same results as above:

private final String b = "B";
private final String z = "A" + b + "C";

OTOH, if b was not a final value, then this optimization would not take place, and two concats would take place at runtime at the place in the code that the declaration takes place.
New Sounds like a shitty compiler.
Seems to me, with a wee bit of work this bit of code:

String x = "Hello ";
String y = System.getProperty("user");

String z;

the expression

z = x + y;

should be interpreted by the compiler the same as

z = new StringBuffer(x).append(y).toString();

and any longer expression like z = a + b + c + d;

should be

z = new StringBuffer(a).append(b).append(c).append(d).toString();

The fact that this is not what happens sounds to me like bad compiler implementation.
The average hunter gatherer works 20 hours a week.
The average farmer works 40 hours a week.
The average programmer works 60 hours a week.
What the hell are we thinking?
Expand Edited by tuberculosis Aug. 21, 2007, 06:03:38 AM EDT
New Wrong, syntactically
should be interpreted by the compiler the same as

z = new StringBuffer(x).append(y).toString();

You can't assign a StringBuffer to z because it's a String. And Java doesn't support the "+" operator for StringBuffers, so declaring everything as StringBuffers is hosed, too.

Basically, string handling in Java looks sorta neat but it's almost as half-baked as the AWT.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New Read it again
The expression:

String z = new StringBuffer(x).append(y).toString();

is of type string (note the toString() on the end there).

What I meant to say was if the compiler sees

z = x + y;

it should read it as if the user wrote the StringBuffer version.

Its not wrong
The average hunter gatherer works 20 hours a week.
The average farmer works 40 hours a week.
The average programmer works 60 hours a week.
What the hell are we thinking?
Expand Edited by tuberculosis Aug. 21, 2007, 06:10:22 AM EDT
New But what about the general case?
The programmer writes a function to append another row onto the string in memory. That function calls other functions to get individual entries, and that function is in turn called many times by other functions. This is exactly the case in question.

While knowing the overall usage of that function, it is easy to optimize, when just compiling that function, it is no surprise at all that the compiler would have trouble. (Also see my Ruby example where attempting to aggressively optimize += can lead to visible semantic changes you wouldn't want.)

Cheers,
Ben
New Re: But what about the general case?
That's sort of degenerative I think.

The right answer is that the language should enforce that Strings are immutable instead of being wishy washy about it. The very existence of += on String is evil. Were it eliminated, programmers would have to declare mutable string references as StringBuffer (which is a stupid name - MutableString would have been better), which is more efficient.

Interestingly, in ObjectiveC, the opposite is true.

NSString *a = @"Hello";
[a stringByAppendingString: @", world!"];

is more efficient than

NSMutableString *a = [NSMutableString stringWithString:@"Hello"];
[a appendString: @", world!"];

I'm not sure why, but its cheaper to create the new strings in ObjC than it is to use the mutable string.
The average hunter gatherer works 20 hours a week.
The average farmer works 40 hours a week.
The average programmer works 60 hours a week.
What the hell are we thinking?
Expand Edited by tuberculosis Aug. 21, 2007, 06:19:12 AM EDT
New (arguing)
One can argue that Java string handling is just plain bad, even if you do know what they did and why they did it. Better to have ignored the "+" case than to have weirded around it.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New I always get dubious...
When someone says there is a single right answer.

Particularly when it is an answer that makes a set of tradeoffs which doesn't fit some common situations.

Immutable strings can have a simpler structure and are faster to access (to allow resizing you need to allow moving the string). However if you are incrementally building up a large string using strings, immutable strings require recopying, and recopying forces you into an n*n algorithm.

Now, you say, just use a StringBuffer (or whatever the current flavour is) object. Well yes. That works. If you try to implement it in code, though, you will eventually hit other bottlenecks as the garbage collector tries to garbage collect on a huge number of little strings. This happens much later, but I have been there, done that. Collecting lots of little objects eventually becomes its own problem. If it is not natively implemented, then it becomes important to do some sort of naive scaling operation. (Having done this in JavaScript I can report that a good heuristic is that when you push a StringBuffer onto a StringBuffer, you resolve the smaller one. Now create one buffer per row, push those onto the ones per table, scalability is now acceptable.)

You know, this is really a headache. I have been there, done that, and don't like it.

Back to the mutable string. OK, it is an immediate significant overhead. Hrm. But go to incrementally build up a string and it bloody works! The first time. The first thing you try. No need to show how good of a programmer you are, how smart you are. You can go off and be smart about something else.

If that initial overhead is acceptable, you don't ever get into having to waste your time worrying about this cr*p. Been there, done that as well. And given current CPU power, I really prefer always wasting some computer time and not having to take out a day or 3 doing performance profiling so I can figure out how to rewrite code that should work the first time.

So I would say that there is no truly "right answer". There are cases where mutable strings make sense. Cases where immutable strings make sense. For me, most of the time, I prefer having mutable strings that work out reasonably well for any access problem, that is good enough. If it isn't good enough some time, then I will make the hard decisions.

Cheers,
Ben
New Don't know why
When I say right answer - I mean right answer to the Java language design decision to have immutable strings that appear to be mutable. Its an efficiency trap. Things that are runtime inefficient shouldn't be easy to write. This is the philosophy behind the C++ STL. You don't implement += on an iterator that has access time O(n). Its deceitful. You would implement += on an iterator that has access time O(1), that makes sense.

I feel the same way about Java's String += operator. If String is immutable, then why did you give me this little thing that appears to modify the string? Again, its deceitful.

(OT - This is one of the things that caused me to pitch C++, its inherent unpredictability. Given the wackiness of operator overloading, I was never entirely sure if a given line of code was expensive or not.)

If your goal is to incrementally build up a big string by continuously modifying some string, then what you want is a string you can modify. Declaring your container a String is bad design choice that appears to be sanctioned by the language designers since they gave you the nifty += and + operators. Its misleading.

I think your other comments had more to do with making the transition from string to text. Text is a much more complicated thing than a string.
The average hunter gatherer works 20 hours a week.
The average farmer works 40 hours a week.
The average programmer works 60 hours a week.
What the hell are we thinking?
Expand Edited by tuberculosis Aug. 21, 2007, 06:26:57 AM EDT
New Whacky overloading. Yes.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New That I will agree with
If you give people basic operations, try to make them efficient ones.

That is one thing that Perl has historically done very well on. Everything has a significant overhead. Assume it is about a factor of 10. But their native data types (which there are very few of) can be put together easily to implement virtually any easily stated algorithm. No, you can't choose to make subtle trade-offs. But it is darned easy to get something that works, and if it worse it is probably not, barring your stupidity and that factor of 10, going to be that bad.

While getting everything just right might be really fun, but get it wrong and you are in trouble. If you have that factor of 10 to throw away up front, well I create enough of my own problems to think about. (What do you mean I have deep recursion...?)

Cheers,
Ben
New Depends
Strictly speaking, the Java compilers are a bit smarter in that the compiler does the concatenation on the string at compile for adjacent constant strings. In other words, the plus signs (concat operators) in the expression you give are not done at runtime. Both versions will store a single string in the constant pool, that will be loaded into the heap when assigned to a variable (or field). Result is that the GC will be called in both instances.

Been awhile since I've looked at the decompiled code, but just glancing at the results, I don't see any significant difference:


class TestMe {
private final String RETRIEVE_SQL = "select column1, column2 "
+ "from table "
+ "where AnyCommonWhereCondition ";

private static final String STATIC_SQL = "select column1, column2 "
+ "from table "
+ "where AnyCommonWhereCondition ";

public static void main(String argv[]) {
TestMe me = new TestMe();
}

TestMe() {
String s;

s = RETRIEVE_SQL;
s = STATIC_SQL;
}
}


.source TestMe.java
.class TestMe
.super java/lang/Object

.field private final RETRIEVE_SQL Ljava/lang/String;
= "select column1, column2 from table where AnyCommonWhereCondition "
.field private static final STATIC_SQL Ljava/lang/String;
= "select column1, column2 from table where AnyCommonWhereCondition "

.method public static main([Ljava/lang/String;)V
.limit stack 2
.limit locals 2
.line 11
new TestMe
dup
invokespecial TestMe/<init>()V
astore_1
.line 12
return
.end method

.method <init>()V
.limit stack 2
.limit locals 2
.line 14
aload_0
invokespecial java/lang/Object/<init>()V
.line 2
aload_0
ldc "select column1, column2 from table where AnyCommonWhereCondition "
putfield TestMe/RETRIEVE_SQL Ljava/lang/String;
.line 17
ldc "select column1, column2 from table where AnyCommonWhereCondition "
astore_1
.line 18
ldc "select column1, column2 from table where AnyCommonWhereCondition "
astore_1
.line 19
return
.end method


Expand Edited by ChrisR Jan. 22, 2002, 04:46:41 PM EST
New I don't think it will affect speed
but you'll have an extra field per instance if you don't use static. I don't see any reason why not to use it.
New Yep, its a size issue.
Without static, you have an instance variable in every instance that all point to the same string (done by the compiler so that string literals that are equivalent are also indentical). So you pay for 1 object reference per instance of the enclosing class.

If you make it static, you have one object reference, period. So you might save 4 bytes per instance by using static and since its a constant (it *is* a constant, right?) its a more rational way to do things.

The only speed impact would be at object construction time, the cost of initializing the per instance reference to the string. This is likely negligible - I doubt you could measure it without creating 100k objects in a tight loop and measuring the difference.
The average hunter gatherer works 20 hours a week.
The average farmer works 40 hours a week.
The average programmer works 60 hours a week.
What the hell are we thinking?
Expand Edited by tuberculosis Aug. 21, 2007, 05:57:47 AM EDT
New Re: Java String field optimization tips?
Let me provide some clarification on this issue (even though it appears to have been beaten to death).

As of JDK 1.3 (I believe), there is no difference between doing a String concatination ( "foo" + "bar" ) and using a StringBuffer to do the same thing. So, the moving of string concatinations to StringBuffers will have no speed difference in the newer JVM's (I'm assuming we're referring to Sun's JVM, as I'm not sure what IBM's does in the same situation).

As for the need for static...I don't believe it will add value. The final means that it's a constant, which means that it shouldn't even create a new value for each instance (I believe, when it's final, that if affects the *scope* of the field, and nothing else). I'm not a 100% sure of that, but I believe that to be the case.

The point is that, to a certain extent, the programmer shouldn't be too worried about certain optimizations, as it's a JVM issue, not a language issue. Obviously, this can be a problem when the JVM is a bottleneck and the programmer has to work around it. The nice thing about this is that newer JVM's speed up existing code (for free...no code changes).

Another example is that way the garbage collector works...lots of small, local object creations aren't that interesting anymore (that is, they aren't the bottleneck). So, doing the string concatination doesn't cost as much in the new JVM's. In fact, in one project I worked on, I'd built an object pool. It worked really nice in JDK 1.2, but when we ran it in JDK 1.3, it actually was slower (sped up after I removed the pool). Of course, I'd better note that the pool was accessible across multiple threads, so the synchronization was the speed cost (but I had to have it, because of the multiple threads).

Dan Shellman
     Java String field optimization tips? - (dlevitt) - (33)
         Strings shouldn't matter. - (admin) - (9)
             JDBC PreparedStatements - (dlevitt) - (8)
                 Depends on the database - (admin)
                 But back to the main question... - (admin) - (6)
                     Yeeha, Scott - (wharris2) - (5)
                         Java profiling usually works pretty well - (admin) - (2)
                             I believe that, no links required - (wharris2)
                             Try: JProbe... -NT - (slugbug)
                         Oh I believe it - (drewk) - (1)
                             Kinda missing the point... - (admin)
         Real world experience - (Yendor) - (18)
             (The technical reason) - (wharris2) - (7)
                 The technical reason behind the technical reason - (admin) - (6)
                     I don't believe it - (ben_tilly) - (5)
                         That would help, but.... - (marlowe) - (1)
                             The scheme addresses those acceptably well - (ben_tilly)
                         Compile time only - (dlevitt) - (2)
                             Are you sure? - (marlowe)
                             Compile time concatenation - (ChrisR)
             Sounds like a shitty compiler. - (tuberculosis) - (9)
                 Wrong, syntactically - (wharris2) - (8)
                     Read it again - (tuberculosis) - (7)
                         But what about the general case? - (ben_tilly) - (6)
                             Re: But what about the general case? - (tuberculosis) - (5)
                                 (arguing) - (wharris2)
                                 I always get dubious... - (ben_tilly) - (3)
                                     Don't know why - (tuberculosis) - (2)
                                         Whacky overloading. Yes. -NT - (wharris2)
                                         That I will agree with - (ben_tilly)
         Depends - (ChrisR)
         I don't think it will affect speed - (Arkadiy) - (1)
             Yep, its a size issue. - (tuberculosis)
         Re: Java String field optimization tips? - (dshellman)

It’s the extra touches.
159 ms