Re: extract Occurrences AFTER ... and before "-30-"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





This is usually a first-year CS programming problem (word frequency
counts) complicated a little bit by needing to extract the text.
You've started off fine, stripping tags, converting to lower case,
you'll want to either convert or strip HTML entities as well, deciding
what you want to do with plurals and words like "you're", "Charlie's",
"it's", etc, also whether something like RFC822 is a word or not
(mixed letters and numbers).

When you've arranged all that, splitting on white space is trivial:

$words = preg_split('/[[:space:]]+/',$text);

and then you just run through the words building an associative array
by incrementing the count of each word as the key to the array:

foreach ($words as $word) {
     $freq[$word]++;
}

For output, you may want to sort the array:

ksort($freq);

That's awesome. Thanks!
Let me start with my first problem:

I want to extract All Occurrences of text AFTER "News Releases" and before "-30-".

http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html

How do I do that?

Yeah, I am still asking first year questions :)) Every project brings new challenges.

John


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux