Re: searching non plain text files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey when I worked in insurance and had to mess with pdf's a ton we used PDFLib. https://www.pdflib.com/ 


On Sat, Dec 15, 2018 at 4:15 AM Tim-Hinnerk Heuer <th.heuer@xxxxxxxxx> wrote:
Way back when, I was using pdftotext and other various doc to text converters to copy the text into a second text file which I then indexed with sphinx search. You can nowadays index quite well also with PostgreSQL or maybe even MySQL.

good luck!

On Sat, 15 Dec 2018 17:20 Jeffry Killen <jekillen@xxxxxxxxxxx wrote:
Hello;

Can anyone point me to instruction/advice about
opening and reading files that are not plain text:

word processing docs, pdf, ps, image files,
even complied code.

I am writing a search function to search file systems
and don't know a lot about the formatting of non plain
text files.

The immediate concern is line breaks in word
processing docs, pdf and ps files.

Then detecting compiled code files so I can
leave them alone. This type of file might not
have a suffix to consider.

Then the various image files that might be
encountered.

Even suffixes aren't a guarantee of the content.

Thanks

Jeff K.


--
Thanks!



Anthony Allen | Software Engineer 
Mobile  469.279.8662 | Email anthony.webit@xxxxxxxxx  |  Email anthony@xxxxxxxxxxxxx  | Dallas, TX | Website theuiarch.com


Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux