Not sure that this is the proper forum, but being how this is bit a piece of my larger program, thought I'd ask it here.
Got a bunch of documents being uploaded. They get stored in my database in binary format. I have a name and the binary document itself and a pointer reference to the task with which it is associated. Documents are in whatever format the end user used - which for this population usually means the big three of Word, Excel and PDF.
What I'd like is to somehow index the document contents for search purposes. Not necessary to be able to understand every document format that comes down the pike - every little bit helps. What I need is some software that accept the document info and contents in a pipe (I don't really won't these files to be in the file system but will write them temporarily to a file if that's what's needed). (And I can not allowed the documents to be opened by the native program - Macros, virii & other nonsense would have to be dealt with).
Are there any generic open source utilities for performing document indexing for various document types?
Thanks.