On Thu, 2011-05-05 at 13:39 -0600, Jason Gerfen wrote: > On 05/04/2011 03:10 PM, Ashley Sheridan wrote: > > On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote: > > > >> On 05/04/2011 01:27 PM, Ashley Sheridan wrote: > >>> On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: > >>> > >>>> I am running into a problem using the REGEXP option with filter_var(). > >>>> > >>>> The string I am using: 09VolunteerApplication.doc > >>>> The PCRE regex I am using: > >>>> /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di > >>>> > >>>> The function in it's entirety: > >>>> return (!filter_var('09VolunteerApplication.doc', > >>>> FILTER_VALIDATE_REGEXP, > >>>> array('options'=>array('regexp'=>'/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di')))) > >>>> ? false : true; > >>>> > >>>> Anyone have any insight into this? > >>>> > >>> > >>> > >>> You missed a + in your regex, at the moment you're only checking to see > >>> if a file starts with a single a-z or number and then is followed by the > >>> period. Then you're checking for oddly for one to four extensions in the > >>> list, are you sure you want to do that? And the square brackets are used > >>> to match characters, not strings, use the standard brackets to allow > >>> from a choice of strings > >>> > >>> Try this: > >>> > >>> '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' > >>> > >>> One other thing you should be aware of maybe, filenames won't always > >>> consist of just the letters a-z and numbers 0-9, they may contain > >>> accented or foreign letters, hyphens, spaces and a number of other > >>> characters depending on the client machines OS. Windows allows very few > >>> characters for example compared to the Unix-like OS's like MacOS and > >>> Linux. > >>> > >> > >> Both are valid PCRE regex's. However the rules regarding usage of > >> parenthesis for an XOR string does not explain a similar regex being > >> used with the filter_var() like so: > >> > >> return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, > >> array('options'=>array('regexp'=>'/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) > >> ? true : false; > >> > >> The above returns string(4) "kc-1" > >> > >> Another test using the following works similarly: > >> > >> return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, > >> array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? > >> true : false; > >> > >> The above returns string(8) "u0368839" > >> > >> And > >> return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, > >> array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? > >> true : false; > >> > >> returns string(8) "gp123456" > >> > >> As you can see these three examples use the start [] as XOR conditionals > >> for multiple strings as prefixes. > >> > >> > >> > > > > > > Not quite, you think they match correctly because that's all you're > > testing for, and you're not looking for anything that might disprove > > that. Using your last example, it will also match these strings: > > > > gu0368839 > > xx0368839 > > p0368839 > > > > > > I tested your first regex with '09VolunteerApplication.doc' and it > > doesn't work at all until you add in that plus after the basename match > > part of the regex: > > > > ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$ > > > > However, your regex (with the plus) also matches these strings: > > > > 09VolunteerApplication.docp > > 09VolunteerApplication.docj > > 09VolunteerApplication.doc| <-- note it's matching the literal bar > > character > > > > Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png| > > docx|csv|xls)$) means the regex works as you expect. Square brackets in > > a regex match a range, not a literal string, and without any sort of > > modifier, match only a single instance of that range. So in your > > example, you're matching a 4 character extension containing any of the > > following characters '|cdfgjlnopstx', and a basename containing only 1 > > character that is either an a-z or a number. > > > > You are right, after a few other tests I stand corrected. My apologies. > However according to the documentation for filter_var() and the PCRE > regexp option if it returns false, which it is, this is indicating an > error with the regex. > > In addition to this I would like to point out that the same regex using > the older preg_match() function works as it should while the character > class following by the pattern (+) fails the validation portion of the > regex. > > print_r(var_dump(filter_var('09VolunteerApplication.doc', > FILTER_VALIDATE_REGEXP, > array('options'=>array('regexp'=>'/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di'))))); > > returns false (invalid regex) when using the character matching class > [a-z0-9]+ with the filter_var() function with the FILTER_VALIDATE_REGEXP > option > > print_r(var_dump(preg_match('/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/i', > '09VolunteerApplication.doc'))); > > return int(1) indicating a valid regex as well as a valid match. > > I believe this should be reported as a bug but I appreciate your > assistance and insights. > > Remove the {1,4} bit, as you're looking for 4 extensions. It's a valid regex sure, but not the regex to match what you're looking for. Out of interest, why are you using a regex here? Is this filename coming from a form upload element? -- Thanks, Ash http://www.ashleysheridan.co.uk