IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New That's a Perl problem if ever I saw one.
Hey, Baz/Ben:

Is File::Find fast enough for this to be worth coding up?


Peter
[link|http://www.no2id.net/|Don't Let The Terrorists Win]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Home]
Use P2P for legitimate purposes!
New ICLRPD: That's a Perl problem if ever I saw one. (new thread)
Created as new thread #235660 titled [link|/forums/render/content/show?contentid=235660|ICLRPD: That's a Perl problem if ever I saw one.]
--
Steve
[link|http://www.ubuntulinux.org|Ubuntu]
New Baz? What's a Baz?
Simple enough problem.
Walk the file system, compute a crc for every file found.
Might want to salt it with the filesize if scared of a crc collision.
Push filename an into an array ref, stored in a hash indexed by crc.
When done, loop through list of crcs, popping out a list of files when there are more than one.

A couple of minutes of code.

Have fun.
New Alternatively...
File::Find::Duplicates


Peter
[link|http://www.no2id.net/|Don't Let The Terrorists Win]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Home]
Use P2P for legitimate purposes!
New Any gotchas?
I've never run a Perl script before, at least not of my own volition. :-)

I found [link|http://aspn.activestate.com/ASPN/CodeDoc/File-Find-Duplicates/Duplicates.html|File::Find::Duplicates] on CPAN. Any gotchas? It's not terribly descriptive, but I assume that the description of things given for [link|http://search.cpan.org/~jhi/perl-5.8.0/lib/File/Find.pm|File::Find] will help clarify things. I also assume I'll be able to figure out how to direct the output to a file.

I'll be downloading ActivePerl in a few minutes to try it out tomorrow, after my download of SimplyMEPIS_3.4-1.rc1 completes.

Thanks a bunch.

Cheers,
Scott.
New The sample program in the documentation...
is close to what you want. Change the call to the function to:

my %dupes = find_duplicate_files('c:\\');

Write it to a file and from the command line you can write something like this:

perl myprogram > output_file

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Thanks Ben. I appreciate it.
New You're a Baz.


Peter
[link|http://www.no2id.net/|Don't Let The Terrorists Win]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Home]
Use P2P for legitimate purposes!
New If you want to be really cautious....
When you find possible duplicates, go scan the files again and verify that they really are duplicate.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Yes
It sounds like his initial application wrote this in the stupid naive way that C programmers love - he wrote a double loop.

With 600,000 files that means that it wants to do about 180,000,000,000 comparisons of one file against another. Sure, C is fast, but that will take a while.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
     Best duplicate finder/deleter for Win32? - (Another Scott) - (19)
         That's a Perl problem if ever I saw one. - (pwhysall) - (9)
             ICLRPD: That's a Perl problem if ever I saw one. (new thread) - (Steve Lowe)
             Baz? What's a Baz? - (broomberg) - (6)
                 Alternatively... - (pwhysall) - (3)
                     Any gotchas? - (Another Scott) - (2)
                         The sample program in the documentation... - (ben_tilly) - (1)
                             Thanks Ben. I appreciate it. -NT - (Another Scott)
                 You're a Baz. -NT - (pwhysall)
                 If you want to be really cautious.... - (ben_tilly)
             Yes - (ben_tilly)
         I'm trying a Python solution now. - (Another Scott) - (2)
             Any reason you can't use gnu/sort ? -NT - (drewk) - (1)
                 Dunno. I'll check into it. Thanks. -NT - (Another Scott)
         I'm trying DUFF now. - (Another Scott) - (3)
             <drool> Mm, Duff... </homer> -NT - (CRConrad) - (1)
                 Ha! :-) -NT - (Another Scott)
             It's called an alpha for a reason. - (Another Scott)
         Edupe seems pretty good. (More details.) - (Another Scott) - (1)
             Edupe crashes on some of my machines. Trying another... - (Another Scott)

Ichi, ni, san, shi.
72 ms