IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New tar vs. cpio
Which would you use to move a large directory tree from machine A to machine B? (400k files, lots of inodes) The copy has to be exactly exact.

-drl
New I'd use tar.
Tar I can make do what I need it to. I don't know cpio.

Incidentally, I've done this over an ad-hoc ssh pipe. Required some thinking to setup but it worked.

Wade.

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

New Re: I'd use tar.
I have a bad feeling about tar. Not standard, sometimes doesn't copy 0 length files, etc. cpio is totally standard. But I really don't know which is better. Who else has needed to mv/cp a gigantic tree?
-drl
New I've done it on Linux.,
cp -pR does the honours, preserves security.

Note that that's GNU cp. Proprietary UNIX upholds the tenet that "if it don't suck ass, it ain't worth paying for", and cheerfully trashes the security.


Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Blog]
New dd
...if you want it EXACTLY exact :p


Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Blog]
New What I'd do.
Use an rsync daemon and make the request on the new machine. And have it verify as well.

That way, you'd be sure to get it right. Not that I don't trust tar or cpio (I turst them for nightly backups period).

Just that doing the tar etc unles done to a network pipe to the other machine... mean you gotta have *THAT* much more space on both machines.

rsync, you only need a few extra MB.

A better option might be:
If this is HP-UX you could also setup NFS export on the source machine and nfs mount the file system on the destination machine then just do a local rsync as well.

Oh... and if you have ssh on both machine (full suite that is) you could scp them as well. Or if you actually have rsh and rcp available on the machines you could just do them the old fashioned rcp method as well.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
Here is an example: [link|http://www.greymagic.com/security/advisories/gm001-ie/|Executing arbitrary commands without Active Scripting or ActiveX when using Windows]
Expand Edited by folkert Sept. 23, 2004, 10:48:01 AM EDT
New Re: What I'd do.
I heard rsync blows chunks performance-wise since it's read-file/write-file one at a time.

Just checked, not an option in any case (on this system). I have sudo but must be judicious. Installing rsync might get me escorted out. (Hell, viewing IWETHEY might get me escorted out.)
-drl
New Nope
rsync handles multiple files at a time and is very fast as long as you turn the right options on and off. This can take research to figure out. For instance I usually don't use -a because that implies -I, which causes it to expensively do an MD5 on every single file. I'm happy knowing that the files have the same size and modification time even though theoretically that could be wrong.

rsync is also clearly The Right solution if you're going to have to resynchronize the data.

However when I sit down with it I always need a while to figure out what exact combination of flags I really want to use this time.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New What I'd be inclined to do
tar -czf - directory | ssh user@machine 'cd wherever; tar -xzf -'

This is assuming GNU tar and files that compress well.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New What he said.

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

New Re: What I'd be inclined to do
They are mostly text files, but won't compressing them waste a lot of time? The network will be local and the machines are both secured. So, it's a question of CPU time vs. network speed.

What I decided on was something like (no sshd running)

find HERE -depth > list
split list
for I in list??
do
cat $I | cpio -o | remsh NEWHOST "cd THERE ; cpio -idum"
done

list is broken up into pieces and parallelized.

I just don't trust tar on vendor UNIX.

-drl
New The CPU overhead is likely less than the network overhead.
Modern CPUs can compress data faster than they can send it down the wire.


Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Blog]
New Especially text...
New Not a given
The files are hyarge, and there are lewge numbers of them.
-drl
New Even PKZip on a 286 running DOS could fly through text files
Archivers are very good at compressing text, and do it very quickly. The algorithms are well known and well optimized.

Compression generally makes more sense than transmitting large files. Sometimes it's essential. See, e.g. [link|http://www.sc2000.org/techpapr/papers/pap.pap254.pdf|this .pdf] of a paper on near-interactive transferring of computational fluid dynamics simulation images over a WAN.

Cheers,
Scott.
New Re: Even PKZip on a 286 running DOS could fly
Well the idea here is just to directly stream data from file to file. I still don't see that a bunch of parallel copyinouts are going to benefit from compression/decompression, given that this must involve creating something temporary on disk, or using a buttload of memory.
-drl
New Doesn't matter
gzip works fine with infinite streams of data since it only keeps track of local information about the data that it is compressing. So have no worries about using it on large files.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New In fact...
Moore's law goes faster for CPUs than for various kinds of I/O. I've actually seen experiments where it was faster to decompress files on the fly from disk than to read uncompressed data!

For transferring text between machines, compression is virtually guaranteed to be a significant win.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New Re: In fact...
I'll try both ways and report the "time"s.
-drl
New Your pipe is too small then
My GB allows about 80MB per second, using about 10% of CPU on the box. I can NOT compress data that fast, even on my Opteron. I can do around 40 MB per second, and that is PINNING a CPU.

I say NO compress over GB.
New Try different compression quality?
gzip allows you to specify how much work it'll do finding a good way to compress data. You can specify this by a flag from -1 (fastest) to -9 (best). The default is -6, which is biased towards being getting good compression at the cost of being fairly slow.

The speed of uncompressing should not depend on the quality of the compression. Also decompression usually is a lot less CPU intensive than compression.

But I have to admit, where I've used compression for sending network data has mostly been with a far smaller pipe than that.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New Try? We do not try. We do.
The data is AFP print stream. I took a 30MB file and
appended it to itself. Highly compressable as you can see.

On a quad Opteron 248:
\n-rw-rw-r--+   1 broom    prod     509505504 Sep 25 18:41 t2\n\n509505504/(1024*1024)\n485 MB\n


Best case to read and throw away (from cache) 485MB:

\n[broom@mash compress]$ time dd if=t2 of=/dev/null bs=32k\n15548+1 records in\n15548+1 records out\n\nreal    0m0.593s\nuser    0m0.030s\nsys     0m0.560s\n


Best case to read and write (cached, no sync)

\n[broom@mash compress]$ time dd if=t2 of=t3 bs=32k\n15548+1 records in\n15548+1 records out\n\nreal    0m3.258s\nuser    0m0.010s\nsys     0m3.230s\n\n\n[broom@mash compress]$ time gzip -c t2 >t2.gz\n\nreal    0m19.355s\nuser    0m18.070s\nsys     0m1.230s\n\n-rw-rw-r--+   1 broom    prod     56286710 Sep 25 18:42 t2.gz\n


Used 100% of a single CPU.

\n485/19\n25 MB per second compression\n\n[broom@mash compress]$ time gzip -1 -c t2 >t2.gz\n\nreal    0m10.504s\nuser    0m9.460s\nsys     0m0.990s\n


Just a little bigger, twice as fast:
\n-rw-rw-r--+   1 broom    prod     68465283 Sep 25 18:47 t2.gz\n


48 MB per second.

Now let's see how pure network compares:

\n[broom@mash compress]$ time dd if=t2  ibs=32k obs=4k | rsh mix.cc3.com dd ibs=4k obs=32k of=/dev/null\n15548+1 records in\n124390+1 records out\n532+352237 records in\n15548+1 records out\n\nreal    0m9.304s\nuser    0m0.310s\nsys     0m1.170s\n



Note on block sizes. This mix of IBS and OBS is best for my Linux<->Linux
transfers. The same command without block size juggling works at about
the same speed but drives up CPU consumption.


It beats the fastest gzip time (minimal compression) while using almost NO
CPU while no requiring additional steps on the receiving side.

Plus, this is using small frame size ethernet. I can DOUBLE my throughput
and drop CPU consumption in 1/2 by going to Jumbo frames, which will
happen as soon as the Sun boxes are off my network.

OK Ben, this is RARE. You are wrong!
New An advantage of gzip is that it's easy to detect errors.
Presumably you'd like to know whether your 485 MB file is the same on both ends. With gzip I assume it's easy to check a CRC-32 or some such thing that's in the file. Presumably you'd want to do something similar with a raw network transfer. Wouldn't that add time on both ends, or do you assume that TCP/IP or dd takes care of that for you?

Sure, if your pipe is big enough, you lose time going through a compression step. That's easy to see. However, what would you do if you needed to transfer your 70 TB to another machine? >:-)

Cheers,
Scott.
New If you MUST...
\n[broom@mash compress]$ time sum t2\n47595 497564\n\nreal    0m2.850s\nuser    0m2.240s\nsys     0m0.610s\n


Note: I DO transfer TB, too damn often.
We usually use a (tar | rsh system tar) command.

Actually, tomorrow, I'll be going in to the office to move about 4TB.
The last time we did this we were actually moving data from pieces
of striped luns, so we used the Veritas ability to evacuate a Lun.

Way too slow. Tomoww I'll be useing a file system level command
such as the one above, except it will be a local system move.
Hell, it might be a "cp -rvp"

Also, note, that I will be doing it via a Sun box. Which is about
1/10th the speed of the Opteron. Compression is WAY TO SLOW at
that point as well.
New Thanks for the figures
I'll note for the future that, with gigabit networking and current CPUs, compression wastes time.

However when I need to stuff data through a DSL, I'll continue to use compression.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New Thanks, and questions.
1. What's the fastest compression program? I'm willing to bet that gzip isn't it. [link|http://www.oberhumer.com/opensource/lzo/|this] may warrant investigation - I'm going to have a play!
2. Is your gzip appropriately compiled for AMD64, or is it a generic i386 binary?
3. Your Quad Opteron would be much happier in my house. Ship it to me right away!


Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Blog]
New lzop vs. gzip
peter@cordelia:~/Build/Compression $ ls -lh\r\ntotal 809M\r\n-rw-r--r--  1 peter peter 809M Sep 26 07:19 test.tar\r\npeter@cordelia:~/Build/Compression $ time gzip -1 -c test.tar > test.tar.gz\r\n\r\nreal    0m48.076s\r\nuser    0m31.255s\r\nsys     0m2.541s\r\npeter@cordelia:~/Build/Compression $ time lzop -1 test.tar\r\n\r\nreal    0m39.514s\r\nuser    0m11.438s\r\nsys     0m2.715s\r\npeter@cordelia:~/Build/Compression $ ls -lh\r\ntotal 1.5G\r\n-rw-r--r--  1 peter peter 809M Sep 26 07:19 test.tar\r\n-rw-r--r--  1 peter peter 309M Sep 26 07:23 test.tar.gz\r\n-rw-r--r--  1 peter peter 361M Sep 26 07:19 test.tar.lzo\r\npeter@cordelia:~/Build/Compression $ uname -a\r\nLinux cordelia 2.6.8 #1 SMP Sun Sep 12 08:46:39 BST 2004 i686 GNU/Linux\r\npeter@cordelia:~/Build/Compression $ free\r\ntotal       used       free     shared    buffers     cached\r\nMem:       1034960    1025564       9396          0       6344     853788\r\n-/+ buffers/cache:     165432     869528\r\nSwap:       746976        160     746816
\r\n

So, on my system, I get 16.8MB/sec compression with gzip, and 20.7MB/sec with lzop; this makes lzop some 20% or so faster. Does this change things enough for it to be worthwhile over your uberpipe?

\r\n

[Note 1: this is on my home PC, which is a 3GHz P4 with HT switched on.]

\r\n

[Note 2: input data set is the result of tar cf test.tar /usr/src, so it's a mixup of binary and text data. And yes, I need to clean up.]



Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Blog]
Expand Edited by pwhysall Sept. 26, 2004, 02:34:36 AM EDT
New Interesting
but it seems the increase in size will bite you more when moving things over a T1.

I'll let you know later. Gotta go setup a new array and more some data.

Note: I disagree. I think the quad opteron is much more happier in my safe, cool, raised floor, generator backup power, 24 x 7 attended, Gbit backbone, 31TB SAN attached, LTO2 backed up, halon workalike protected, 10mbit ethernet internet connected computer room than it could ever possibly be in your house.

And it would get lonely without all the other quad opterons to keep it company.
Oh, you thought it was the only one? nonononono.
That was the 1st box to prove it out. We are standardizing on them for our high end high io compute needs. They will replace the Sun 450s.
New Seeing you tease Peter makes me aware that...
you put the B in BoFH.

:-)

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New In person
At your service.

As I watch this data slowly transfer want wait for the DBA to call me back.....
New What are happening to the 450s?
I'd like to have a few.

Or more...

all depends on the price now. Doesn't it?
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
Here is an example: [link|http://www.greymagic.com/security/advisories/gm001-ie/|Executing arbitrary commands without Active Scripting or ActiveX when using Windows]
New Dunno. Ask Dave.
We have an E350 that I just ripped all the GBICs out. If you want that you could probably make a deal.
New I'm sorry Dave, but Dave's friend Dave says Dave's at...
Dave's house.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New Thanks.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
Here is an example: [link|http://www.greymagic.com/security/advisories/gm001-ie/|Executing arbitrary commands without Active Scripting or ActiveX when using Windows]
New Ah, but in MY house...
...it wouldn't be processing spam :-p


Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Blog]
New Not spam
Dead trees.
New Still spam; just spam on dead trees.
What -- you didn't think what you do for a living is in any way admirable or honourable, did you?


   [link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad]
(I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Your lies are of Microsoftian Scale and boring to boot. Your 'depression' may be the closest you ever come to recognizing truth: you have no 'inferiority complex', you are inferior - and something inside you recognizes this. - [link|http://z.iwethey.org/forums/render/content/show?contentid=71575|Ashton Brown]
New "admirable or honourable" - Yup
Both actually. Depends on the project of the moment.

[link|http://dictionary.reference.com/search?q=honourable|http://dictionary.re...arch?q=honourable]

This should be all projects I work on. Don't screw the client, provide a product or service that they need, use renewable resources, employ a variety of people (and yes, I'm pretty sure they all have the legal right to work in this country, if not being full citizens, we have a VERY strict HR department), provide full health benefits for all employees with a modest co-pay for family coverage).

Yup, this one I'm sure of.

[link|http://dictionary.reference.com/search?q=admirable|http://dictionary.re...earch?q=admirable]

This one as well, but not all the time. Think of ways to do things that are better / faster / cheaper / easier / safer / less error prone / higher quality than before. Implement them in a way that is the most cost effective, trying to use best resource for the project. Fight others, even when they are scary and have the possibility to hurt me, when I think they are doing things the wrong way that would end up hurting the company, employees, or our long term ability to be in business. Put my ego aside and let others shine and have credit for their ideas or work effort.

Note: About 1/3 of our business is fullfillment. We maintain stock for companies and send it out on a daily basis, typically triggerred by a customer / prospective client calling and asking for it. Would you be trying to find something wrong with that as well?
New The fulfillment (sic) third, maybe. The rest? Nope.
Barry demonstrates that some times, not even an on-line dictionary helps to understand terms like "admirable" or "honourable":
Don't screw the client, provide a product or service that they need, blah blah yadda yadda.
and
...better / faster / cheaper / easier / safer / less error prone / higher quality ... cost effective, blah blah yadda yadda.
Oh, how wonderful.

"Yes, Don Pepe, I have conscientiously and safely loaded the shipment that our American clients ordered on tonight's convoy across the Rio Grande." Or, "Jawohl, Sturmbannführer, I have come up with a faster, cheaper, and less error-prone way of processing the inmates for the final camp."

That, too, is satisfying the direct "client" -- it all depends on how you define that term (and this is why I added the emphasis to the quotes above).

The ultimate "client", though, is more properly called "the victim", at least in my two examples... And quite arguably, IMO, in your case too. YTF would spam be any better just because it comes on "dead trees" (i.e, paper)?!? (On the contrary, that means you have to kill not only innocent electrons like e-spammers, but entire living breathing trees, too!)


Note: About 1/3 of our business is fullfillment. We maintain stock for companies and send it out on a daily basis, typically triggerred by a customer / prospective client calling and asking for it. Would you be trying to find something wrong with that as well?
No, that part seems like it quite possibly could be admirable and honourable. (Can't swear to it, of course -- if this is the "fulfillment" part of an MLM scheme, or something like that, then I'd guess it isn't, after all.)


   [link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad]
(I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Your lies are of Microsoftian Scale and boring to boot. Your 'depression' may be the closest you ever come to recognizing truth: you have no 'inferiority complex', you are inferior - and something inside you recognizes this. - [link|http://z.iwethey.org/forums/render/content/show?contentid=71575|Ashton Brown]
New That was my intuitive suspicion
-drl
New And as usual
intuition is wrong. Intuition is nothing more than a fancy name for prejudice.
Get some data and THEN judge.
New Well sure
I'll still try both in my situation. As soon as my uid is moved below 64k. Believe it or not, cpio and tar on HP-UX are broken if your uid is >64k. The file ownership is changed to that of the invoking process, so sudoing the move changes the owner of all files to root. I do not understand this. gnutar works.
-drl
New Beware spaces in filenames
simple "find - split" may not work - you may need -print0 in find
--

... a reference to Presidente Arbusto.
-- [link|http://itre.cis.upenn.edu/~myl/languagelog/archives/001417.html|Geoffrey K. Pullum]
New Already checked :)
-drl
     tar vs. cpio - (deSitter) - (43)
         I'd use tar. - (static) - (2)
             Re: I'd use tar. - (deSitter) - (1)
                 I've done it on Linux., - (pwhysall)
         dd - (pwhysall)
         What I'd do. - (folkert) - (2)
             Re: What I'd do. - (deSitter) - (1)
                 Nope - (ben_tilly)
         What I'd be inclined to do - (ben_tilly) - (35)
             What he said. -NT - (static)
             Re: What I'd be inclined to do - (deSitter) - (33)
                 The CPU overhead is likely less than the network overhead. - (pwhysall) - (30)
                     Especially text... -NT - (Another Scott) - (4)
                         Not a given - (deSitter) - (3)
                             Even PKZip on a 286 running DOS could fly through text files - (Another Scott) - (1)
                                 Re: Even PKZip on a 286 running DOS could fly - (deSitter)
                             Doesn't matter - (ben_tilly)
                     In fact... - (ben_tilly) - (1)
                         Re: In fact... - (deSitter)
                     Your pipe is too small then - (broomberg) - (22)
                         Try different compression quality? - (ben_tilly) - (18)
                             Try? We do not try. We do. - (broomberg) - (17)
                                 An advantage of gzip is that it's easy to detect errors. - (Another Scott) - (1)
                                     If you MUST... - (broomberg)
                                 Thanks for the figures - (ben_tilly)
                                 Thanks, and questions. - (pwhysall)
                                 lzop vs. gzip - (pwhysall) - (12)
                                     Interesting - (broomberg) - (11)
                                         Seeing you tease Peter makes me aware that... - (ben_tilly) - (1)
                                             In person - (broomberg)
                                         What are happening to the 450s? - (folkert) - (3)
                                             Dunno. Ask Dave. - (broomberg) - (2)
                                                 I'm sorry Dave, but Dave's friend Dave says Dave's at... - (ben_tilly)
                                                 Thanks. -NT - (folkert)
                                         Ah, but in MY house... - (pwhysall) - (4)
                                             Not spam - (broomberg) - (3)
                                                 Still spam; just spam on dead trees. - (CRConrad) - (2)
                                                     "admirable or honourable" - Yup - (broomberg) - (1)
                                                         The fulfillment (sic) third, maybe. The rest? Nope. - (CRConrad)
                         That was my intuitive suspicion -NT - (deSitter) - (2)
                             And as usual - (broomberg) - (1)
                                 Well sure - (deSitter)
                 Beware spaces in filenames - (Arkadiy) - (1)
                     Already checked :) -NT - (deSitter)

Travelling at the speed of light with the headlights on.
150 ms