On 25/04/07, Justin Frim <jfrim@xxxxxxxxxxx> wrote:
Dotan Cohen wrote: > On 25/04/07, Justin Frim <jfrim@xxxxxxxxxxx> wrote: > >> I'm assuming then you want the data to be able to contain _some_ mark-up >> considered to be safe? >> > > Not at this stage, no. Maybe if the users ask for it, but not now in > the beginning. The universe's best engineer, Scotty, once advised us > tell them that it's impossible, and only then to implement what they > want. You should decide now before going any further, do you want the future capability to add mark-up codes? And if so, are they going to be similar to HTML using the < and > characters, or are they going to be like BBcode using the [ and ] characters? This decision will determine if filters to gaurd against XSS attacks really are the best solution or not.
It would be BBcode if anything. It may be the product of the lazy, but I feel more secure parsing it than [x]HTML.
See, you should only use filters to prevent XSS attacks if you plan on using the < and > characters for mark-up codes (now or in the future). Otherwise, use htmlspecialchars() or htmlentities(). If you use a filter that strips < and > characters, you'll have a lot of angry / frustrated / confused users when they find they can't type < and > as literals if they're not aware that < and > are reserved for special mark-up codes. Consider: Suppose a bunch of mathematicians are having a discussion on the message board, and one of them decides to state that "variable x is greater than 3". They might type "x > 3", but your filter will end up garbling it up. Not good! If you use htmlspecialchars(), then anything they type will appear as typed.
I currently an using htmlencode, so < and > show as expected. I do expect the math faculty to use those symbols :).
If you want future capability for mark-up, you should inform the users which characters are reserved, and how they can represent them as literals. Basically, you're informing the users if they should "speak HTML", "speak BBcode", or "speak the natural language" when they post on the site.
Right now it's speak the natural language, but I do not want to encumber the possibility of change. Thanks for the insight. Dotan Cohen http://dotancohen.com/eng/army_pictures.php http://iphanatics.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php