On Thu, 2010-05-20 at 12:40 -0400, Al wrote: > > On 5/20/2010 12:02 PM, Jim Lucas wrote: > > Al wrote: > >> > >> > >> On 5/20/2010 11:23 AM, David Otton wrote: > >>> On 20 May 2010 15:52, Al<news@xxxxxxxxxxxxx> wrote: > >>> > >>>> I agree blacklisting is a flawed approach in general. My approach is to > >>>> strictly confine entry text to a whitelist of benign, acceptable > >>>> tags. The > >>> > >>> But that's not what you've done. You've blacklisted the following > >>> patterns: > >>> > >>> "\<script\x20", > >>> "\<embed\x20", > >>> "\<object\x20", > >>> 'language="javascript"', > >>> 'type="text/javascript"', > >>> 'language="vbscript\"', > >>> 'type="text/vbscript"', > >>> 'language="vbscript"', > >>> 'type="text/tcl"', > >>> "error_reporting\(0\)",//Most hacks I've seen make certain they turn > >>> of error reporting > >>> "\<?php",//Here for the heck of it. > >>> > >>> and allowed everything else. A couple of examples: > >>> > >>> You haven't blacklisted<iframe> > >>> > >>> <IMG SRC="javascript:alert('XSS');"> would sail straight through that > >>> list. > >>> > >>> I can't tell from that list alone, but are your checks > >>> case-insensitive? Because<ScRipT> would pass through a case-sensitive > >>> check. > >>> > >>> We can go on like this all day, and at the end of it you still won't > >>> be sure you've blacklisted everything. > >>> > >>> The first answer at > >>> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags > >>> > >>> is related, also. > >> > >> I'm not being clear. First pass is thru the blacklist, which effectually > >> tells hacker to not bother and totally deletes the entry. > >> > >> If the raw entry gets past the blacklist, it must then only contain my > >> whitelist tags. e.g., the two examples you cited were caught by the > >> whitelist parser. > > > > What exactly does your whitelist parser do? > > It posts an error message that shows the user what the error is [e.g., > "<iframe> is an invalid tag. Your text cannot posted until all errors are > corrected." > > Only when the submitted raw text passes the blacklist and whitelist, will the > raw text be saved and be available for on-the-fly conversion to html. > > > > > >> > >> And yes, I'm using preg_match() with the "i" arg. > >> > >> Note, my blacklist is not looking for tags per se, just the start of a > >> bad tag. My users are only suppose to be entering plain text with some > >> nice highlighting and lists, etc. The editor will not post anything else. > > > > But who say I have to use your editor? > > No one says you must by my editor. > > > > >> > >> Al... > >> > > > > > > I'm methodically going thru ha.ckers tests and so far my filters have caught > everything. > > I greatly appreciate everyone's help. > I think Jim meant how is your whitelist operating, not what it does to the user. Posting a message saying that <iframe> tags are not allowed sounds more like a blacklist type of behaviour. A whitelist should consider the data sent from the user as bad, and only allow it through if it meets certain criteria. By checking specifically for an <iframe> tag and being able to warn the user specifically, you're just using a blacklist not a whitelist. Thanks, Ash http://www.ashleysheridan.co.uk