On Thu, 2010-05-20 at 11:51 -0400, Al wrote: > > On 5/20/2010 11:23 AM, David Otton wrote: > > On 20 May 2010 15:52, Al<news@xxxxxxxxxxxxx> wrote: > > > >> I agree blacklisting is a flawed approach in general. My approach is to > >> strictly confine entry text to a whitelist of benign, acceptable tags. The > > > > But that's not what you've done. You've blacklisted the following patterns: > > > > "\<script\x20", > > "\<embed\x20", > > "\<object\x20", > > 'language="javascript"', > > 'type="text/javascript"', > > 'language="vbscript\"', > > 'type="text/vbscript"', > > 'language="vbscript"', > > 'type="text/tcl"', > > "error_reporting\(0\)",//Most hacks I've seen make certain they turn > > of error reporting > > "\<?php",//Here for the heck of it. > > > > and allowed everything else. A couple of examples: > > > > You haven't blacklisted<iframe> > > > > <IMG SRC="javascript:alert('XSS');"> would sail straight through that list. > > > > I can't tell from that list alone, but are your checks > > case-insensitive? Because<ScRipT> would pass through a case-sensitive > > check. > > > > We can go on like this all day, and at the end of it you still won't > > be sure you've blacklisted everything. > > > > The first answer at > > http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags > > is related, also. > > I'm not being clear. First pass is thru the blacklist, which effectually tells > hacker to not bother and totally deletes the entry. > > If the raw entry gets past the blacklist, it must then only contain my whitelist > tags. e.g., the two examples you cited were caught by the whitelist parser. > > And yes, I'm using preg_match() with the "i" arg. > > Note, my blacklist is not looking for tags per se, just the start of a bad tag. > My users are only suppose to be entering plain text with some nice highlighting > and lists, etc. The editor will not post anything else. > > Al... > How are you matching against your whitelist? Thanks, Ash http://www.ashleysheridan.co.uk