Post #235,655
11/23/05 10:38:26 AM
|
That's a Perl problem if ever I saw one.
Hey, Baz/Ben:
Is File::Find fast enough for this to be worth coding up?
Peter [link|http://www.no2id.net/|Don't Let The Terrorists Win] [link|http://www.kuro5hin.org|There is no K5 Cabal] [link|http://guildenstern.dyndns.org|Home] Use P2P for legitimate purposes!
|
Post #235,661
11/23/05 10:57:11 AM
|
ICLRPD: That's a Perl problem if ever I saw one. (new thread)
Created as new thread #235660 titled [link|/forums/render/content/show?contentid=235660|ICLRPD: That's a Perl problem if ever I saw one.]
-- Steve [link|http://www.ubuntulinux.org|Ubuntu]
|
Post #235,687
11/23/05 12:17:48 PM
|
Baz? What's a Baz?
Simple enough problem. Walk the file system, compute a crc for every file found. Might want to salt it with the filesize if scared of a crc collision. Push filename an into an array ref, stored in a hash indexed by crc. When done, loop through list of crcs, popping out a list of files when there are more than one.
A couple of minutes of code.
Have fun.
|
Post #235,704
11/23/05 1:35:25 PM
|
Alternatively...
File::Find::Duplicates
Peter [link|http://www.no2id.net/|Don't Let The Terrorists Win] [link|http://www.kuro5hin.org|There is no K5 Cabal] [link|http://guildenstern.dyndns.org|Home] Use P2P for legitimate purposes!
|
Post #235,795
11/23/05 10:45:23 PM
|
Any gotchas?
I've never run a Perl script before, at least not of my own volition. :-)
I found [link|http://aspn.activestate.com/ASPN/CodeDoc/File-Find-Duplicates/Duplicates.html|File::Find::Duplicates] on CPAN. Any gotchas? It's not terribly descriptive, but I assume that the description of things given for [link|http://search.cpan.org/~jhi/perl-5.8.0/lib/File/Find.pm|File::Find] will help clarify things. I also assume I'll be able to figure out how to direct the output to a file.
I'll be downloading ActivePerl in a few minutes to try it out tomorrow, after my download of SimplyMEPIS_3.4-1.rc1 completes.
Thanks a bunch.
Cheers, Scott.
|
Post #235,799
11/23/05 11:01:27 PM
|
The sample program in the documentation...
is close to what you want. Change the call to the function to:
my %dupes = find_duplicate_files('c:\\');
Write it to a file and from the command line you can write something like this:
perl myprogram > output_file
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #235,800
11/23/05 11:02:48 PM
|
Thanks Ben. I appreciate it.
|
Post #235,705
11/23/05 1:35:55 PM
|
You're a Baz.
Peter [link|http://www.no2id.net/|Don't Let The Terrorists Win] [link|http://www.kuro5hin.org|There is no K5 Cabal] [link|http://guildenstern.dyndns.org|Home] Use P2P for legitimate purposes!
|
Post #235,784
11/23/05 9:52:06 PM
|
If you want to be really cautious....
When you find possible duplicates, go scan the files again and verify that they really are duplicate.
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #235,783
11/23/05 9:51:11 PM
|
Yes
It sounds like his initial application wrote this in the stupid naive way that C programmers love - he wrote a double loop.
With 600,000 files that means that it wants to do about 180,000,000,000 comparisons of one file against another. Sure, C is fast, but that will take a while.
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|