Hi,
I am using the function fopen to open a word document, loading the
contents into a variable and then using a substr_count to count the
number of times a certain string is found, this is allowing me to search
through the file and say how many times the word appears, I can even use
str_replace to highlight certain words. However Microsoft word seems to
put a lot of rubbish in the header and footer, I am wondering is it
possible to filter this rubbish out to get the exact document.
I also tried using fopen to open a PDF file, but as PDF is handled
differently it came up completely different with no words at all, just
full of rubbish. Is there anyway I can get this information using a
simple fopen?
I am basically trying to create a search engine which can read within
files similar to google. The only problem I would have after I have done
all this is actually weighting the search results, however I would
probably have to create the results first and then finally go through
the results to try to weight them.
Does anyone else have any experience in this or could help me out with
any of the problems I am having?
Thanks
Kevin
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php