On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote: > On 05/04/2011 01:27 PM, Ashley Sheridan wrote: > > On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: > > > >> I am running into a problem using the REGEXP option with filter_var(). > >> > >> The string I am using: 09VolunteerApplication.doc > >> The PCRE regex I am using: > >> /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di > >> > >> The function in it's entirety: > >> return (!filter_var('09VolunteerApplication.doc', > >> FILTER_VALIDATE_REGEXP, > >> array('options'=>array('regexp'=>'/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di')))) > >> ? false : true; > >> > >> Anyone have any insight into this? > >> > > > > > > You missed a + in your regex, at the moment you're only checking to see > > if a file starts with a single a-z or number and then is followed by the > > period. Then you're checking for oddly for one to four extensions in the > > list, are you sure you want to do that? And the square brackets are used > > to match characters, not strings, use the standard brackets to allow > > from a choice of strings > > > > Try this: > > > > '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' > > > > One other thing you should be aware of maybe, filenames won't always > > consist of just the letters a-z and numbers 0-9, they may contain > > accented or foreign letters, hyphens, spaces and a number of other > > characters depending on the client machines OS. Windows allows very few > > characters for example compared to the Unix-like OS's like MacOS and > > Linux. > > > > Both are valid PCRE regex's. However the rules regarding usage of > parenthesis for an XOR string does not explain a similar regex being > used with the filter_var() like so: > > return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, > array('options'=>array('regexp'=>'/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) > ? true : false; > > The above returns string(4) "kc-1" > > Another test using the following works similarly: > > return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, > array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? > true : false; > > The above returns string(8) "u0368839" > > And > return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, > array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? > true : false; > > returns string(8) "gp123456" > > As you can see these three examples use the start [] as XOR conditionals > for multiple strings as prefixes. > > > Not quite, you think they match correctly because that's all you're testing for, and you're not looking for anything that might disprove that. Using your last example, it will also match these strings: gu0368839 xx0368839 p0368839 I tested your first regex with '09VolunteerApplication.doc' and it doesn't work at all until you add in that plus after the basename match part of the regex: ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$ However, your regex (with the plus) also matches these strings: 09VolunteerApplication.docp 09VolunteerApplication.docj 09VolunteerApplication.doc| <-- note it's matching the literal bar character Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png| docx|csv|xls)$) means the regex works as you expect. Square brackets in a regex match a range, not a literal string, and without any sort of modifier, match only a single instance of that range. So in your example, you're matching a 4 character extension containing any of the following characters '|cdfgjlnopstx', and a basename containing only 1 character that is either an a-z or a number. -- Thanks, Ash http://www.ashleysheridan.co.uk