IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Just splitting pages, or pulling content out?
--

Drew
New This
If all you need is to split the files, then any print-to-PDF driver will do. Win 10 has it built-in. For Win 7 and before, CutePDF works well but the installer was weaponized to drop adware and finding a clean older version is getting harder by the day. Possible alternatives with virtual printers: Foxit Reader, Nitro PDF reader.

Beyond splitting files, only Adobe Acrobat is able to deal with the mess. Unless the house is on fire, the only sane method is to fix the source and regenerate the PDF files as desired. (The reason for that is that PDF files do not have internal structure. The contents are just a bunch of objects at particular positions on the page. It is up to you to repair e.g. word wrap if a change pushes a line beyond the margin.)
New What he said
Although if you can't get the source, copy-and-paste out of Reader will probably get you blocks of text easier than trying to pull it from the file.

Oh, and PS: There are some GNU tools that have Windows versions that split and join pages.
--

Drew
Expand Edited by drook May 27, 2019, 01:38:50 PM EDT
New Re: Just splitting pages, or pulling content out?
Files are PDFs originally created by scans. However, some of the original papers can't be found, so scanning again is only a partial solution. I want to pull pages out of the existing files and break them up into smaller PDFs, grouped by vendor.




Satan (impatiently) to Newcomer: The trouble with you Chicago people is, that you think you are the best people down here; whereas you are merely the most numerous.
- - - Mark Twain, "Pudd'nhead Wilson's New Calendar" 1897
New Re: Just splitting pages, or pulling content out?
https://pdfsam.org/pdfsam-basic/ may do what you want. It's free. I use v5 Pro ($) and it works very well.

Good like.

Cheers,
Scott.
New Virtual printer will probably be the fastest to extract
Load the large file in the PDF reader of choice, then use the Windows printer dialog to print the desired page range(s) to a new PDF file.

If that doesn't cut it, there are tools (pdfimages, part of XpdfReader) that can dump all the embedded files. PITA, but then the scans can be reassembled to suit in MS Word/OOo Writer/... and the PDFs regenerated.
New Is the vendor data embedded in a readable format?
And if not directly, you can usually convert PDF files to PostScript and pull out data from that. So then it is a matter of batching some logic so you can create an automated script to break out the pages individually, the information you need to rename the pages and then concatenate them back together into your final targeted output.

I used to do this kind of stuff all the time for a large-scale print runs in the print shop, while populating the web server.
New don't need the vendor data, just the images of the documents
the smaller files need the PDF pages from the big ones where each small file is for a specific vendor. And sadly a lot of the original documents can't be found.




Satan (impatiently) to Newcomer: The trouble with you Chicago people is, that you think you are the best people down here; whereas you are merely the most numerous.
- - - Mark Twain, "Pudd'nhead Wilson's New Calendar" 1897
New I don't know if this is appropriate or not.
But I seem to recall you're a .Net developer. I've used this in a few apps I've written and can highly recommend it. I'm not sure this fits your use case, but thought I'd mention it.

Good luck!
bcnu,
Mikem

It's mourning in America again.
     looking for free software to edit PDF files - (lincoln) - (13)
         Just splitting pages, or pulling content out? -NT - (drook) - (8)
             This - (scoenye) - (1)
                 What he said - (drook)
             Re: Just splitting pages, or pulling content out? - (lincoln) - (5)
                 Re: Just splitting pages, or pulling content out? - (Another Scott)
                 Virtual printer will probably be the fastest to extract - (scoenye)
                 Is the vendor data embedded in a readable format? - (crazy) - (1)
                     don't need the vendor data, just the images of the documents - (lincoln)
                 I don't know if this is appropriate or not. - (mmoffitt)
         Libre Office can open them... - (static)
         Got Office? - (pwhysall) - (1)
             Tried using Word - (lincoln)
         I found this - (lincoln)

It breathes. -MORE-
73 ms