IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New I'm trying a Python solution now.
I wasn't able to immediately figure out the Perl solutions mentioned, so I went and looked at what was available for Python. I'm experimenting with [link|http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/364953|FindDuplicateFileNames] and it seems to work pretty well. It doesn't hog the CPU and the system seems very responsive while the script is running (from the command line. It also works OK inside the PythonWin IDE).

At the moment I'm just redirecting the output to a .txt file. When it's done I'll see how easy it is to sort that (probably with OO.org's spreadsheet as I think it'll choke Excel) or experiement with some other Python scripts.

Thanks for the tips.

[edit:]

I'm not sure when the Python script finished, but it was done in less than 90 minutes, searching through 4 partitions of ~ 600,000 files (~ 500 GB) and writing the matches to a ~ 157 MB text file. (Note that this verion does not calculate CRC-32 or MD5 checksums.)

I loaded the text file in OpenOffice.org 2.0 and it was loaded into Writer. It took a minute or so on this Athlon64 3000+ with 1 GB of RAM, but it didn't choke. It's a 43085 page document with 10 pt Courier New. ;-)

I'll mess around some more, checking matches based on some sort of checksum or at least the file size before doing any actual deleting of duplicates. (Note that there's a Python script called [link|http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/362459|Dupinator] that isn't very flexible but does MD5 checksums (first quickly checking the first 1024 bytes, then checking the entire file if it's a potential duplicate), and something called [link|http://www.pixelbeat.org/fslint/|FSLint] that needs GTK2 and thus seems to be designed for Linux systems.)

Cheers,
Scott.
Expand Edited by Another Scott Dec. 7, 2005, 02:36:50 PM EST
New Any reason you can't use gnu/sort ?
===

Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
[link|http://DocHope.com|http://DocHope.com]
New Dunno. I'll check into it. Thanks.
     Best duplicate finder/deleter for Win32? - (Another Scott) - (19)
         That's a Perl problem if ever I saw one. - (pwhysall) - (9)
             ICLRPD: That's a Perl problem if ever I saw one. (new thread) - (Steve Lowe)
             Baz? What's a Baz? - (broomberg) - (6)
                 Alternatively... - (pwhysall) - (3)
                     Any gotchas? - (Another Scott) - (2)
                         The sample program in the documentation... - (ben_tilly) - (1)
                             Thanks Ben. I appreciate it. -NT - (Another Scott)
                 You're a Baz. -NT - (pwhysall)
                 If you want to be really cautious.... - (ben_tilly)
             Yes - (ben_tilly)
         I'm trying a Python solution now. - (Another Scott) - (2)
             Any reason you can't use gnu/sort ? -NT - (drewk) - (1)
                 Dunno. I'll check into it. Thanks. -NT - (Another Scott)
         I'm trying DUFF now. - (Another Scott) - (3)
             <drool> Mm, Duff... </homer> -NT - (CRConrad) - (1)
                 Ha! :-) -NT - (Another Scott)
             It's called an alpha for a reason. - (Another Scott)
         Edupe seems pretty good. (More details.) - (Another Scott) - (1)
             Edupe crashes on some of my machines. Trying another... - (Another Scott)

Very small hands... and NO Vaseline.
84 ms