Ashley Sheridan wrote: > I've been thinking about this problem for a little while, and the thing > is, I can think of ways of doing it, but they're not very nice, and I > don't think they're going to be fast. > > Basically, I have a load of HTML formatted content in a database that > get displayed onto the site. It's part of a rudimentary CMS. > > Currently, the titles for each article are displayed on a page, and each > title links to the full article. However, that leaves me with a page > which is essentially a list of links, and that's not ideal for SEO. What > I wanted to do to enhance the page is to have a short excerpt of x > number of words/characters beneath each article title. The idea being > that search engines will find the page as more than a link farm, and > visitors won't have to just rely on the title alone for the content. > > Here's the rub though. As the content is in HTML form, I can't just grab > the first 100 characters and display them as that could leave an open > tag without a closing one, potentially breaking the page. I could use > strip_tags on the 100-character excerpt, but what if the excerpt itself > broke a tag in half (i.e. <acronym title="something"> could become > <acron ) > > The only solutions I can see are: > > > * retrieve the entire article, perform a strip_tags and then take > the excerpt > * use a regex inside of mysql to pull out only the text > > > The thing is, neither of these seems particularly pretty, and I am sure > there's a better way, but it's too early in the week for my brain to be > fully functional I think! > > Does anyone have any ideas about what I could do, or do you think I'm > seeing problems where there are none? > > Thanks, > Ash > http://www.ashleysheridan.co.uk > /** * creates an abstract from any string, a nice one that stops at a full * stop or end of a word betwen 140-180 chars. * */ function createAbstract( $string ) { $lines = explode( "\n" , $string ); if( count($lines) > 1 && strlen($lines[0]) > 140 ) { $string = $lines[0]; } if( strlen($string) < 180 ) return $string; $string = substr( $string , 0 , 180); $chars = str_split( $string ); $string = ''; foreach( $chars as $char ) { $string .= $char; if( $char == '.' && strlen($string) > 120 ) { return $string; } } $string = ''; foreach( $chars as $char ) { $string .= $char; if( $char == ' ' && strlen($string) > 140 ) { return trim( $string ) . '...'; } } return $string; } /** * given an html (or fragment) tidy in to usable html * and strip back to text, new lines in tact * */ function htmlToText( $html ) { $html = str_replace( '&' , '&' , str_replace( '&' , '&' , $html ) ); $config = array( 'clean' => true, 'drop-proprietary-attributes' => true, 'output-xhtml' => true, 'show-body-only' => true, 'word-2000' => true, 'wrap' => '0' ); $tidy = new tidy(); $tidy->parseString($html, $config, 'utf8'); $tidy->cleanRepair(); $html = tidy_get_output($tidy); $text = str_replace( '&' , '&' , str_replace( '&' , '&' , $text ) ); return strip_tags($text); } using those two together should do it; they're pretty basic and could do with a tidy, but gets the job done (you'll probably want to change the 140 chars to something different) Best, Nathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php