I don't believe it

Post #25,278 by ben_tilly 1/23/02 1:08:55 PM Reply	I don't believe it If substrings are the issue, then you can have your cake and eat it too. Make every string a structure with a length, offset, and a pointer to a string storage structure. Make the string storage structure have a maximum length and a pointer to the start of the actual string data. The string data is just an array of bytes. The space is allocated in powers of 2. If you go to append and you still have room, you just insert the data. Otherwise you reallocate room, move the existing string, free the old space and allocate at the new place. This makes incremental appends perform just fine. Taking substrings is just as fast. The overhead is seen from following one extra pointer when finding the string. This is not entirely dissimilar to what you find in, say, Perl or Ruby. (Neither is exactly like this. though I think that Ruby is closer. A lot closer.) It makes building up large strings incrementally perform just fine. Substrings can be taken efficiently (well not in Perl - but look at how Ruby implements backreferences lazily for instance). But even so in Ruby the += operator is still slow, for exactly the technical reason given above. You are creating lots of new objects. Why? Well, very simple. += has defined semantic effects. You can't get those semantic effects without creating new objects. If you want an efficient string append you have to use the special << operator because the semantic effects are visible. Observe: `# Create 2 strings init_a = "Hello"; init_b = "Hello"; # Duplicate them. dup_a = init_a; dup_b = init_b; # Append to the dups dup_a += ", World"; dup_b << ", World"; # What do we have? puts "init_a is '#{init_a}'"; #-> "Hello" puts "init_b is '#{init_b}'"; #-> "Hello, World" puts "dup_a is '#{dup_a}'"; #-> "Hello, World" puts "dup_b is '#{dup_a}'"; #-> "Hello, World"` And there we see why, even with smart data structures where efficiency is achievable, the += operator still has to be slow. Cheers, Ben
Post #25,311 by marlowe 1/23/02 3:14:25 PM Reply	That would help, but.... 1. I believe Java tries to conserve memory by merging duplicate strings of class String. So you can easily have two String objects pointing to the same buffer. Edit one of them, and the runtime has to alloc a completely new buffer for it. Even if you're shrinking it. 2. If you alloc in powers of 2, you'll have beacoup wasted space, and the spectre of disk thrashing lurks in the shadows. This isn't a criticism of the basic concept, but some fine tuning is called for. Now if you increment by a fraction, say 5/4, you get a different tradeoff ratio. Less wasted memory, but more frequent allocs. In fact, it's rumored that some Java implementations do something like this. But there's no beating the fine control you get with your own StringBuffers. That way, the efficiency is more directly influenced by the ability of the programmer. I say use String for prototyping and for code that doesn't get exercised much, but use StringBuffer for high usage production code. Or else code it in C or C++. [link\|http://www.angelfire.com/ca3/marlowe/index.html\|http://www.angelfir...e/index.html] Sometimes "tolerance" is just a word for not dealing with things.
Post #25,389 by ben_tilly 1/23/02 10:11:39 PM Reply	The scheme addresses those acceptably well It is, of course, a compromise between issues. You wind up wasting about a third of your space. (Much less if you just apply the simple heuristic of only creating a string with as little memory as you can, then using the power of 2 trick if someone starts appending to it.) If you want to share objects you either need to do some fancy footwork, or else define semantics in which programmers choose what happens. (Ruby does this, see the code example I gave.) That algorithmic efficiency and memory usage conflict is no surprise. That is usually true. Scalable algorithms tend to use buffering. Buffering costs memory. Vice versa contortions to avoid using extra memory take excess operations. As for your suggestion, mine is to not use Java. In other languages you get reasonable default behaviour without having to know language trivia about available types with precise promised tradeoffs. Besides which, string manipulation is not exactly one of Java's strengths. Cheers, Ben
Post #25,313 by dlevitt 1/23/02 3:15:12 PM Reply	Compile time only We are not concatinating the strings at runtime - only at compile time. Hopefully the compiler is smart enough to use a single literal string - literals are supposed to be java.lang.String.intern()'d by the compiler - so only one copy of each should exist - even if multiple copy of the class are constructed.
Post #25,322 by marlowe 1/23/02 3:54:40 PM Reply	Are you sure? If it's static, I'm almost certain the expression will be evaluated at compile time, or at the latest, evaluated just once when the program starts up. But if you leave out the "static," the "final" may not be enough to do it. The compiler may not be clever enough, and it would end up being evaluated every time you create an instance. Put in the static keyword. At worst it's redundant. Better safe than sorry. [link\|http://www.angelfire.com/ca3/marlowe/index.html\|http://www.angelfir...e/index.html] Sometimes "tolerance" is just a word for not dealing with things.
Post #25,326 by ChrisR 1/23/02 4:30:52 PM Reply	Compile time concatenation The JDK compiler will generate identical byte code for the following: private final String x = "ABC"; private final String y = "A" + "B" + "C"; The concatenation is done by the compiler, not at runtime. The compiler is even smart enuf to do the compile time concatenation with a constant variable (i.e. final) if the value of the constant is fixed at compile time. This means that the z string below will produce the same results as above: private final String b = "B"; private final String z = "A" + b + "C"; OTOH, if b was not a final value, then this optimization would not take place, and two concats would take place at runtime at the place in the code that the declaration takes place.

Welcome to IWETHEY!