Re: How to fetch .DOC or .DOCX file in php

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Boyd, Todd M. wrote:
>> -----Original Message-----
>> From: Jagdeep Singh [mailto:jagsaini1982@xxxxxxxxx]
>> Sent: Thursday, December 04, 2008 8:39 AM
>> To: php-general@xxxxxxxxxxxxx
>> Subject:  How to fetch .DOC or .DOCX file in php
>> Importance: Low
>> Hi !
>> I want to fetch text from .doc / .docx file and save it into database
>> file.
>> But when  I tried to fetch text with fopen/fgets etc ... It gave me
>> special
>> characters with text.
>> (With .txt files everything is fine)
>> Only problem is with doc/docx files.
>> I dont know whow to remove "SPECIAL CHARACTERS" from this text ...
> A.) This has been handled on this list several times. Please search the
> archives before posting a question.
> B.) Did you even TRY to Google for this? In the first 5 matches for "php
> open ms word" I found this:
> uments-via-php-and-com-81/
> You will need an MS Windows machine for this solution to work. If you're
> using *nix... well... good luck.
> // Todd

Ah, not true about the MS requirement.  If all you want is the clear/clean
text (without any formatting), then I can do it with php on any platform.

If this is what is needed, here is the code to do it.


$filename = './12345.doc';
if ( file_exists($filename) ) {

	if ( ($fh = fopen($filename, 'r')) !== false ) {

		$headers = fread($fh, 0xA00);

		# 1 = (ord(n)*1) ; Document has from 0 to 255 characters
		$n1 =     ( ord($headers[0x21C]) - 1 );

		# 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 characters
		$n2 =   ( ( ord($headers[0x21D]) - 8 ) * 256 );

		# 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 characters
		$n3 =   ( ( ord($headers[0x21E]) * 256 ) * 256 );

		# (((ord(n)*256)*256)*256) ; Document has from 16775424 to 4294965504 characters
		$n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

		# Total length of text in the document
		$textLength = ($n1 + $n2 + $n3 + $n4);
		$extracted_plaintext = fread($fh, $textLength);
		# if you want the plain text with no formatting, do this
		echo $extracted_plaintext;
		# if you want to see your paragraphs in a web page, do this
		echo nl2br($extracted_plaintext);




Hope this helps.

I am working on a set of php classes that will be able to read the text with the formatting included and convert it to a standard document format.
The standard format that it will end up in has yet

Jim Lucas

   "Some men are born to greatness, some achieve greatness,
       and some have greatness thrust upon them."

Twelfth Night, Act II, Scene V
    by William Shakespeare

PHP General Mailing List (
To unsubscribe, visit:

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux