Re: Filter out MS Word 'quotes' for RSS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Apr 26, 2006, at 5:45 AM, Kevin Davies wrote:

Obviously I need to convert these on entry, or on output into RSS. Does anyone know of an easy way to do this, or is it a case of identifying each
unusual character individually?

These high-ascii characters have ord() values greater than 126. If you're rendering to HTML, you can go through your string converting them into '&#ord_value;', where `ord_value' is the return from ord() (so your result looks like "Ò"), which will fix the primary problem (things breaking) and should at least limit the damage on the secondary problem (loss of information). In my experience, however, this will clobber some entities pretty badly. Alternatively, you can just zap them (into "*" or "~" or some other printable character), which will work better for text rendering.

You can also mix the two, by identifying individually those characters that you are concerned with preserving and zapping the others, e.g.

<?php

/**
* Validate a string as being gremlin-free text. Characters with ordinal value
* greater than 126 will be converted into the best equivalent.
*
* @param any Something which might be a string.
*
* @returns array|bool True (valid), false (not valid), or an array of
*  unconverted exception ordinal values (valid but dirty).
*/
function validate_text( &$text ) {

    static $conversions = array(
    	 // Windows & Word
         133        => '&hellip;'
        ,145        => '&lsquo;'
        ,146        => '&rsquo;'
        ,147        => '&ldquo;'
        ,148        => '&rdquo;'
        ,149        => '&bull;'
        ,150        => '&ndash;'
        ,151        => '&mdash;'

		 // Mac
        ,165        => '&bull;'
        ,208        => '&ndash;'
        ,209        => '&mdash;'
        ,210        => '&ldquo;'
        ,211        => '&rdquo;'
        ,212        => '&lsquo;'
        ,213        => '&rsquo;'
        );

    if( is_scalar( $text ) || is_null( $text ) ) {

        $corpus = str_replace(
             array_map( 'chr', array_keys( $conversions ) )
            ,$conversions
            ,$text
            );

        $gremlins = array( );

        for( $ii = 0; $ii < strlen( $corpus ); $ii++ ) {
            if( ($ordv = ord( $corpus[ $ii ]) ) > 126 ) {
                $gremlins[ $ii ] = $ordv;
                $corpus[ $ii ] = '*';
                }
            }

        $text = $corpus;

        if( count( $gremlins ) ) {
            return $gremlins;
            }

        return true;
        }

    return false;
    }

?>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux