IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Sheet-feed scanner? OCR to searchable PDF?
Hi,

I think I'm going to get into digitized documents in a pretty big way because I'm being consumed by paperwork at work and home. I'm considering a couple of sheet-feed scanners:

[link|http://www.newegg.com/Product/Product.aspx?Item=N82E16838111004|Canon DR-2580C] for $640 and [link|http://www.newegg.com/Product/Product.aspx?Item=N82E16838115009|Fujitsu FI-5110C] for $650 and [link|http://www.newegg.com/Product/Product.aspx?Item=N82E16838115015|Fujitsu FI-5120C] for $850.

They have various software bundles, and the more expensive one has a SCSI interface in addition to the USB 2.0 interface common to all.

I seem to have very poor luck with USB 2.0 - if I use more than a couple of USB devices in a powered hub (D-Link, IOGear, other brands) I can't get reliable 2.0 operation (if I get the PC to recognize it at all). So I'm strongly considering a SCSI interface even though that means buying a PCI card and cables. It generally works better if plugged straight into the PC, but even on one with an additional PCI card with more sockets I often have poor luck (on Win2k and Kubuntu and others with AMD ASUS motherboards and on a T41 laptop). :-/

I would be scanning bills, tax and medical information, etc., along with hand-written notes, documentation, etc. I have tens of thousands of pages of stuff, but can't imagine doing more than a few hundred pages a day. I won't need to be constructing electronic forms or spreadsheets from the documents, but do want to make them easily searchable as much as possible. PDF output is a must, and if that's fast and accurate and batch-able that's a big plus. (I don't want to have to sit at the desk hitting buttons all day to get this done.)

I have no experience with OCR. I know full versions of Acrobat will do it. Is that good enough, or is some other OCR utility needed for good performance and accuracy?

Any thoughts on the boxes above? Is Acrobat Standard (that comes bundled with some of the machines) good enough, or should I plan on upgrading to Professional or Conglomerate or Behemoth or some other version?

I'd probably be doing this under Winders, but will consider Mac versions. Is TWAIN on Linux good enough to run scanners like these? Is Linux OCR up to snuff yet?

Thanks for any pointers. I appreciate it.

Cheers,
Scott.
New SCSI on scanners can be fun
I have three SCSI scanners now (Microtek flatbed, Nikon film scanner, Fuji duplex sheetfed which I haven't used yet). I've found that using the standard Adaptec SCSI interface works best; others can be flaky.

I've had good experience with add on PCI USB boards. They're pretty cheap (e.g. 5 ports for <$20), the cables are much easier to deal with the SCSI (more flexible, cheaper, and far fewer versions), and there are no termination headaches.

Consider looking at eBay and elsewhere for refurbished scanners (that's where I got my Fuji). Also, I've seen new duplex sheet feeder scanners at Fry's for much less (IIRC Xerox for <$400).

I haven't yet done what you looking at doing, so I can't help more there. A typical scanner will come with OCR software; Nuance/Scansoft also sells a lot of OCR (newer version of the Caere products) and PDF software.

--Tony
New Thanks, Tony. I see Fujitsu has an eBay store... :-)
New Got a refurbished Fujitsu fi-5120c on eBay.
I ended up getting a refurbished Fujitsu fi-5120c on eBay. I set it up this evening. It works fine through a USB 2.0 hub and seems to be bundled with everything one could need (when using the USB interface). The scanner itself looks new, and the Windows Control Panel indicates it has only been used for 210 pages.

It does come with rather old versions of the bundled software (Acrobat Standard 6.0, Kofax VRS 4.10, ScandAll21 4.3), but I was able to convert 6 pages of 4 column lists of names duplex printed, ~ 10 point Helvetica on thin newsprint-like paper (pages torn from a magazine), scanned duplex at 300 dpi, to an OCRed PDF very easily (with no manual intervention and no errors in the OCR). I'll probably use higher dot pitches for things that I might want to print again. I don't know if the benefits of upgrading to later versions of the software are worth it for my needs/wants just yet.

I also got a 2 meter Adaptec HD50M to HD50M cable and a Tekram DC-390U2WE PCI SCSI card to go with it, but so far it doesn't look like I'll need them. There might be a speed advantage to using SCSI, but I don't want to mess with it right now.

Cheers,
Scott.
     Sheet-feed scanner? OCR to searchable PDF? - (Another Scott) - (3)
         SCSI on scanners can be fun - (tonytib) - (2)
             Thanks, Tony. I see Fujitsu has an eBay store... :-) -NT - (Another Scott) - (1)
                 Got a refurbished Fujitsu fi-5120c on eBay. - (Another Scott)

This was a random error that most likely occurred when a ray of cosmic radiation hit a memory chip at just the right angle resulting in a bit changing from a 0 to a 1.
45 ms