Read Through PHP Files

Kevin <Kevin@xxxxxxxxxxxxxxxxx> · Fri, 10 Nov 2006 10:55:23 +0000

Hi,

I am using the function fopen to open a word document, loading the 
contents into a variable and then using a substr_count to count the 
number of times a certain string is found, this is allowing me to search 
through the file and say how many times the word appears, I can even use 
str_replace to highlight certain words. However Microsoft word seems to 
put a lot of rubbish in the header and footer, I am wondering is it 
possible to filter this rubbish out to get the exact document.

I also tried using fopen to open a PDF file, but as PDF is handled 
differently it came up completely different with no words at all, just 
full of rubbish. Is there anyway I can get this information using a 
simple fopen?

I am basically trying to create a search engine which can read within 
files similar to google. The only problem I would have after I have done 
all this is actually weighting the search results, however I would 
probably have to create the results first and then finally go through 
the results to try to weight them.

Does anyone else have any experience in this or could help me out with 
any of the problems I am having?

Thanks

Kevin

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php