On Mon, Mar 19, 2012 at 10:43 AM, Arno Kuhl <arno@xxxxxxxxxxxxxx> wrote: > -----Original Message----- > From: tamouse mailing lists [mailto:tamouse.lists@xxxxxxxxx] > Sent: 19 March 2012 10:28 AM > To: php-general@xxxxxxxxxxxxx > Subject: Re: Getting knotted with quotes encoding - (one possible solution) > > On Sun, Mar 18, 2012 at 10:19 PM, Tamara Temple <tamouse.lists@xxxxxxxxxxxxxxxx> wrote: >> On Tue, 13 Mar 2012 16:35:44 +0200, Arno Kuhl <arno@xxxxxxxxxxxxxx> sent: >> >>> From: Ashley Sheridan [mailto:ash@xxxxxxxxxxxxxxxxxxxx] >>> Sent: 13 March 2012 03:25 PM >>> To: arno@xxxxxxxxxxxxxx; php-general@xxxxxxxxxxxxx >>> Subject: Re: Getting knotted with quotes encoding >>> >>> >>> Arno Kuhl <arno@xxxxxxxxxxxxxx> wrote: >>> >>>> I've been battling with quotes encoding when outputting javascript >>>> with php. >>>> It can't be unique, so I'm hoping someone has a working solution >>>> they're willing to share. >>>> >>>> The following works perfectly as long as there aren't any single >>>> quotes in the link text: >>>> echo "<span onclick=\"insertLink('$sUrl','$sTitle')\" >>>> class='linkSel'>$sTitle</span>"; >>>> >>>> if $sTitle has the value What's new it outputs: >>>> <span >>>> onclick="insertLink('article/whats-new.html','What's >>>> new')" class='linkSel'>What's new</span> >>>> >>>> It displays fine, but javascript complains with: >>>> Expected ')' linkmanager.php Line:525 Char:63 >>>> >>>> >>>> So I fix this by swapping the double and single quotes around: >>>> echo "<span onclick='insertLink(\"$sUrl\",\"$sTitle\")' >>>> class='linkSel'>$sTitle</span>"; >>>> >>>> Now for that specific link it outputs: >>>> <span >>>> onclick='insertLink("article/whats-new.html","What's >>>> new")' class='linkSel'>What's new</span> And javascript is happy. >>>> >>>> But elsewhere there's a link Fred "Buster" Cox and it outputs: >>>> <span >>>> onclick='insertLink("article/fred-buster-cox.html","Fred >>>> "Buster" Cox")' class='linkSel'>Fred "Buster" >>>> Cox</span> >>>> >>>> Again it displays fine, but javascript complains with: >>>> Expected ')' linkmanager.php Line:743 Char:77 >>>> >>>> >>>> So it looks like I can't have links that include single quotes and >>>> double quotes, only one or the other. >>>> >>>> One work-around I thought of was to convert any link texts that >>>> included double quotes into single quotes when the content is >>>> posted, and it would then be displayed with single quotes even >>>> though the user entered double quotes. It's far from ideal but it >>>> would work, though I can think of a few situations where it would be >>>> quite confusing to the reader. Are there any other solutions that >>>> would allow both types of quotes without any conversions? >>>> >>>> Cheers >>>> Arno >>>> >>>> >>>> -- >>> >>> >>> You aren't escaping the quotes correctly when they go into your output. >>> You're escaping them for html not javascript. Javascript (like php) >>> escapes single quotes inside a single quote string with a back slash. >>> >>> >>> Thanks, >>> Ash >>> http://ashleysheridan.co.uk >>> --------- >>> >>> Thanks for that Ashley. >>> You're right about the encoding. >>> I had a line prior to that: >>> $sTitle = htmlentities($title, ENT_QUOTES, 'ISO-8859-1', >>> FALSE); Which encoded the quotes. >>> >>> >>> I couldn't find anything so made a function, which might be useful >>> for others. >>> It’s a first shot, I'm sure there are ways to improve performance. >>> I also changed the encoding to exclude single quotes. >>> (I'm sure the indenting will get screwed up in the mail) >>> >>> >>> $sTitle = fixSingleQuotes(htmlentities($title, ENT_COMPAT, >>> 'ISO-8859-1', FALSE)); >>> >>> ..... >>> >>> >>> ///////////////////////////////////////////////////////////////////// >>> /////////// // convert single quotes to curly quotes, xml compliant >>> // assumes apostrophes must be between 2 alpha chars // and any other >>> ' is a single quote // ‘ = left single quote // ’ = right >>> single quote and apostrophe function fixSingleQuotes($sText) { >>> if (strpos($sText, "'") !== FALSE) { >>> // there are quotes to convert >>> $bOpenQuote = FALSE; >>> $arrAlpha = explode(' ', "a b c d e f g h i j k l m n >>> o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V >>> W X Y Z"); >>> $arrText = str_split($sText); >>> while (($pos = strpos($sText, "'")) !== FALSE) { >>> if ($pos == 0) { >>> // must be an open quote in first pos >>> $sText = "‘".substr($sText, 1); >>> $bOpenQuote = TRUE; >>> } else { >>> if (in_array($arrText[$pos-1], >>> $arrAlpha) >>> AND in_array($arrText[$pos+1], $arrAlpha)) { >>> // apostrophe >>> $quote = "’"; >>> } else { >>> // quote >>> if (!$bOpenQuote) { >>> $quote = "‘"; >>> $bOpenQuote = TRUE; >>> } else { >>> $quote = "’"; >>> $bOpenQuote = FALSE; >>> } >>> } >>> $sText = substr($sText, 0, >>> $pos).$quote.substr($sText, $pos+1); >>> } >>> } >>> } >>> return ($sText); >>> >>> } //fixSingleQuotes() >>> >>> >>> >>> Cheers >>> Arno >>> >>> >>> -- >>> PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: >>> http://www.php.net/unsub.php >>> >>> >> >> >> This is interesting. I wasn't aware that javascript reencoded html >> entities in this way. >> >> I'm wondering, though, wouldn't a call to addslashes work in this case? > > Wow, no it wouldn't. At least not the way it needs to to be more universal. I spent quite a bit of time pondering this and testing things, and finally came up with this writeup: > http://wiki.tamaratemple.com/Technology/HandlingQuotesInJavascript > > Given the final outcome of needing to write a base64 codec for javascript, perhaps Arno's solution above it better? > ---- > > Hi tamouse > > Interesting article. I referenced http://www.dwheeler.com/essays/quotes-in-html.html which you also might find useful for this topic. David Wheeler deals generally with converting double and single quotes to improve style while remaining compliant across as many areas as possible, whereas I was specifically focussed on replacing single quotes with something else to overcome the javascript encoding problem (as was your article). It's worth reading why he suggests avoiding things like " > > > Hi Tamara > > Be careful of using that code which was a first attempt, (1) it has a bug, (2) the rules are too simplistic - it covers most cases, but to cover that last bit requires a LOT more code, plus there are 2 conditions that just can't be handled as far as I can see, and (3) it only works for English. > > Firstly the pos pointer goes out of sync between the string and the array as soon as the string is expanded with the first conversion, so the array needs to be recreated each time, by moving the line $arrText = str_split($sText); inside the while loop. > > Secondly the rules of that function don't handle texts like: "rock 'n roll", "Cass' ball", " 'twas the...", "1's and 2's", 3"6' . > > I rewrote the function to fix the bug and handle some of the extra cases, so it now works for "Cass' ball" (as long as it's not inside a single quote set, as in " he said 'this is Cass' ball' ") and it works for "rock 'n roll" (with a special test for " 'n ") and for "1's and 2's" and 3"6'. In fact it does work for " he said 'this is Cass' ball' " and " 'twas the..." as long as there aren't any following single quotes sets, but to do that required an extra pass so the performance drops (I define single quote sets as open/close single quotes excluding apostrophes). I think you can't come up with a function to handle an unreal string such as " he said 'these are Cass' balls' and she said 'these are Jess' cats' and 'twas the night before he said 'boo' to her 'cause Jess' cat is 'mad' red ". > > This was quite an interesting exercise, and could make quite an interesting competition to see who could come up with the most efficient function to handle the most cases correctly. I think the main rule to follow is that in almost every case where the text is impossible to correctly convert it's also probably very difficult for users to read, so maybe you can assume that the most difficult cases won't occur in the real world. > > Cheers > Arno > Thanks for that link, Arno, I'll check into and possibly modify my article. The article was an experiment to see how these things worked. I would definitely agree that the first attempt was a failure, not just because of possible bugs, but more because it just did not do the job AT ALL. (I guess I should emphasize that in the article.) The base64 version seems like it might be the most robust in terms of getting data where it needs to go, as long as you can also deal with the various character encoding issues (which is not trivial!!). I don't think there is really a good general solutions to this at hand. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php