IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Re: Just splitting pages, or pulling content out?
Files are PDFs originally created by scans. However, some of the original papers can't be found, so scanning again is only a partial solution. I want to pull pages out of the existing files and break them up into smaller PDFs, grouped by vendor.




Satan (impatiently) to Newcomer: The trouble with you Chicago people is, that you think you are the best people down here; whereas you are merely the most numerous.
- - - Mark Twain, "Pudd'nhead Wilson's New Calendar" 1897
New Re: Just splitting pages, or pulling content out?
https://pdfsam.org/pdfsam-basic/ may do what you want. It's free. I use v5 Pro ($) and it works very well.

Good like.

Cheers,
Scott.
New Virtual printer will probably be the fastest to extract
Load the large file in the PDF reader of choice, then use the Windows printer dialog to print the desired page range(s) to a new PDF file.

If that doesn't cut it, there are tools (pdfimages, part of XpdfReader) that can dump all the embedded files. PITA, but then the scans can be reassembled to suit in MS Word/OOo Writer/... and the PDFs regenerated.
New Is the vendor data embedded in a readable format?
And if not directly, you can usually convert PDF files to PostScript and pull out data from that. So then it is a matter of batching some logic so you can create an automated script to break out the pages individually, the information you need to rename the pages and then concatenate them back together into your final targeted output.

I used to do this kind of stuff all the time for a large-scale print runs in the print shop, while populating the web server.
New don't need the vendor data, just the images of the documents
the smaller files need the PDF pages from the big ones where each small file is for a specific vendor. And sadly a lot of the original documents can't be found.




Satan (impatiently) to Newcomer: The trouble with you Chicago people is, that you think you are the best people down here; whereas you are merely the most numerous.
- - - Mark Twain, "Pudd'nhead Wilson's New Calendar" 1897
New I don't know if this is appropriate or not.
But I seem to recall you're a .Net developer. I've used this in a few apps I've written and can highly recommend it. I'm not sure this fits your use case, but thought I'd mention it.

Good luck!
bcnu,
Mikem

It's mourning in America again.
     looking for free software to edit PDF files - (lincoln) - (13)
         Just splitting pages, or pulling content out? -NT - (drook) - (8)
             This - (scoenye) - (1)
                 What he said - (drook)
             Re: Just splitting pages, or pulling content out? - (lincoln) - (5)
                 Re: Just splitting pages, or pulling content out? - (Another Scott)
                 Virtual printer will probably be the fastest to extract - (scoenye)
                 Is the vendor data embedded in a readable format? - (crazy) - (1)
                     don't need the vendor data, just the images of the documents - (lincoln)
                 I don't know if this is appropriate or not. - (mmoffitt)
         Libre Office can open them... - (static)
         Got Office? - (pwhysall) - (1)
             Tried using Word - (lincoln)
         I found this - (lincoln)

Well, I should say!
68 ms