IWETHEY v. 0.3.0 | TODO
1,095 registered users | 1 active user | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New The technical reason behind the technical reason
Efficiency, believe it or not. Strings are immutable objects. The interpreter knows that a String will never change. As a result, Strings store their internals as a byte array. If you create a new String as a substring of a string, you actually just create an object with an offset and length into the old string's byte array. This is quite fast when you are doing a lot of substrings and the like.

But when you add them together, you end up copying two byte arrays into a third byte array, and creating a new String object from that.

Unfortunately, more people add strings together than take substrings, so you end up with needing something like a StringBuffer, which appends other strings rationally.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New I don't believe it
If substrings are the issue, then you can have your cake and eat it too.

Make every string a structure with a length, offset, and a pointer to a string storage structure. Make the string storage structure have a maximum length and a pointer to the start of the actual string data. The string data is just an array of bytes.

The space is allocated in powers of 2. If you go to append and you still have room, you just insert the data. Otherwise you reallocate room, move the existing string, free the old space and allocate at the new place. This makes incremental appends perform just fine. Taking substrings is just as fast. The overhead is seen from following one extra pointer when finding the string.

This is not entirely dissimilar to what you find in, say, Perl or Ruby. (Neither is exactly like this. though I think that Ruby is closer. A lot closer.) It makes building up large strings incrementally perform just fine. Substrings can be taken efficiently (well not in Perl - but look at how Ruby implements backreferences lazily for instance).

But even so in Ruby the += operator is *still* slow, for exactly the technical reason given above. You are creating lots of new objects. Why? Well, very simple. += has defined semantic effects. You can't get those semantic effects without creating new objects. If you want an efficient string append you have to use the special << operator because the semantic effects are visible. Observe:

# Create 2 strings
init_a = "Hello";
init_b = "Hello";
# Duplicate them.
dup_a = init_a;
dup_b = init_b;
# Append to the dups
dup_a += ", World";
dup_b << ", World";
# What do we have?
puts "init_a is '#{init_a}'"; #-> "Hello"
puts "init_b is '#{init_b}'"; #-> "Hello, World"
puts "dup_a is '#{dup_a}'"; #-> "Hello, World"
puts "dup_b is '#{dup_a}'"; #-> "Hello, World"


And there we see why, even with smart data structures where efficiency is achievable, the += operator still has to be slow.

Cheers,
Ben
New That would help, but....
1. I believe Java tries to conserve memory by merging duplicate strings of class String. So you can easily have two String objects pointing to the same buffer. Edit one of them, and the runtime has to alloc a completely new buffer for it. Even if you're shrinking it.

2. If you alloc in powers of 2, you'll have beacoup wasted space, and the spectre of disk thrashing lurks in the shadows. This isn't a criticism of the basic concept, but some fine tuning is called for. Now if you increment by a fraction, say 5/4, you get a different tradeoff ratio. Less wasted memory, but more frequent allocs.

In fact, it's rumored that some Java implementations do something like this.

But there's no beating the fine control you get with your own StringBuffers. That way, the efficiency is more directly influenced by the ability of the programmer. I say use String for prototyping and for code that doesn't get exercised much, but use StringBuffer for high usage production code. Or else code it in C or C++.
[link|http://www.angelfire.com/ca3/marlowe/index.html|http://www.angelfir...e/index.html]
Sometimes "tolerance" is just a word for not dealing with things.
New The scheme addresses those acceptably well
It is, of course, a compromise between issues. You wind up wasting about a third of your space. (Much less if you just apply the simple heuristic of only creating a string with as little memory as you can, then using the power of 2 trick if someone starts appending to it.) If you want to share objects you either need to do some fancy footwork, or else define semantics in which programmers choose what happens. (Ruby does this, see the code example I gave.)

That algorithmic efficiency and memory usage conflict is no surprise. That is usually true. Scalable algorithms tend to use buffering. Buffering costs memory. Vice versa contortions to avoid using extra memory take excess operations.

As for your suggestion, mine is to not use Java. In other languages you get reasonable default behaviour without having to know language trivia about available types with precise promised tradeoffs. Besides which, string manipulation is not exactly one of Java's strengths.

Cheers,
Ben
New Compile time only
We are not concatinating the strings at runtime - only at compile time.

Hopefully the compiler is smart enough to use a single literal string - literals are supposed to be java.lang.String.intern()'d by the compiler - so only one copy of each should exist - even if multiple copy of the class are constructed.
New Are you sure?
If it's static, I'm almost certain the expression will be evaluated at compile time, or at the latest, evaluated just once when the program starts up. But if you leave out the "static," the "final" may not be enough to do it. The compiler may not be clever enough, and it would end up being evaluated every time you create an instance.

Put in the static keyword. At worst it's redundant. Better safe than sorry.
[link|http://www.angelfire.com/ca3/marlowe/index.html|http://www.angelfir...e/index.html]
Sometimes "tolerance" is just a word for not dealing with things.
New Compile time concatenation
The JDK compiler will generate identical byte code for the following:

private final String x = "ABC";
private final String y = "A" + "B" + "C";

The concatenation is done by the compiler, not at runtime. The compiler is even smart enuf to do the compile time concatenation with a constant variable (i.e. final) if the value of the constant is fixed at compile time. This means that the z string below will produce the same results as above:

private final String b = "B";
private final String z = "A" + b + "C";

OTOH, if b was not a final value, then this optimization would not take place, and two concats would take place at runtime at the place in the code that the declaration takes place.
     Java String field optimization tips? - (dlevitt) - (33)
         Strings shouldn't matter. - (admin) - (9)
             JDBC PreparedStatements - (dlevitt) - (8)
                 Depends on the database - (admin)
                 But back to the main question... - (admin) - (6)
                     Yeeha, Scott - (wharris2) - (5)
                         Java profiling usually works pretty well - (admin) - (2)
                             I believe that, no links required - (wharris2)
                             Try: JProbe... -NT - (slugbug)
                         Oh I believe it - (drewk) - (1)
                             Kinda missing the point... - (admin)
         Real world experience - (Yendor) - (18)
             (The technical reason) - (wharris2) - (7)
                 The technical reason behind the technical reason - (admin) - (6)
                     I don't believe it - (ben_tilly) - (5)
                         That would help, but.... - (marlowe) - (1)
                             The scheme addresses those acceptably well - (ben_tilly)
                         Compile time only - (dlevitt) - (2)
                             Are you sure? - (marlowe)
                             Compile time concatenation - (ChrisR)
             Sounds like a shitty compiler. - (tuberculosis) - (9)
                 Wrong, syntactically - (wharris2) - (8)
                     Read it again - (tuberculosis) - (7)
                         But what about the general case? - (ben_tilly) - (6)
                             Re: But what about the general case? - (tuberculosis) - (5)
                                 (arguing) - (wharris2)
                                 I always get dubious... - (ben_tilly) - (3)
                                     Don't know why - (tuberculosis) - (2)
                                         Whacky overloading. Yes. -NT - (wharris2)
                                         That I will agree with - (ben_tilly)
         Depends - (ChrisR)
         I don't think it will affect speed - (Arkadiy) - (1)
             Yep, its a size issue. - (tuberculosis)
         Re: Java String field optimization tips? - (dshellman)

Your occasional fulminations are spicy & crunchy too.
56 ms