On 05/04/2011 03:10 PM, Ashley Sheridan wrote: > On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote: > >> On 05/04/2011 01:27 PM, Ashley Sheridan wrote: >>> On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: >>> >>>> I am running into a problem using the REGEXP option with filter_var(). >>>> >>>> The string I am using: 09VolunteerApplication.doc >>>> The PCRE regex I am using: >>>> /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di >>>> >>>> The function in it's entirety: >>>> return (!filter_var('09VolunteerApplication.doc', >>>> FILTER_VALIDATE_REGEXP, >>>> array('options'=>array('regexp'=>'/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di')))) >>>> ? false : true; >>>> >>>> Anyone have any insight into this? >>>> >>> >>> >>> You missed a + in your regex, at the moment you're only checking to see >>> if a file starts with a single a-z or number and then is followed by the >>> period. Then you're checking for oddly for one to four extensions in the >>> list, are you sure you want to do that? And the square brackets are used >>> to match characters, not strings, use the standard brackets to allow >>> from a choice of strings >>> >>> Try this: >>> >>> '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' >>> >>> One other thing you should be aware of maybe, filenames won't always >>> consist of just the letters a-z and numbers 0-9, they may contain >>> accented or foreign letters, hyphens, spaces and a number of other >>> characters depending on the client machines OS. Windows allows very few >>> characters for example compared to the Unix-like OS's like MacOS and >>> Linux. >>> >> >> Both are valid PCRE regex's. However the rules regarding usage of >> parenthesis for an XOR string does not explain a similar regex being >> used with the filter_var() like so: >> >> return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, >> array('options'=>array('regexp'=>'/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) >> ? true : false; >> >> The above returns string(4) "kc-1" >> >> Another test using the following works similarly: >> >> return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, >> array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? >> true : false; >> >> The above returns string(8) "u0368839" >> >> And >> return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, >> array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? >> true : false; >> >> returns string(8) "gp123456" >> >> As you can see these three examples use the start [] as XOR conditionals >> for multiple strings as prefixes. >> >> >> > > > Not quite, you think they match correctly because that's all you're > testing for, and you're not looking for anything that might disprove > that. Using your last example, it will also match these strings: > > gu0368839 > xx0368839 > p0368839 > > > I tested your first regex with '09VolunteerApplication.doc' and it > doesn't work at all until you add in that plus after the basename match > part of the regex: > > ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$ > > However, your regex (with the plus) also matches these strings: > > 09VolunteerApplication.docp > 09VolunteerApplication.docj > 09VolunteerApplication.doc| <-- note it's matching the literal bar > character > > Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png| > docx|csv|xls)$) means the regex works as you expect. Square brackets in > a regex match a range, not a literal string, and without any sort of > modifier, match only a single instance of that range. So in your > example, you're matching a 4 character extension containing any of the > following characters '|cdfgjlnopstx', and a basename containing only 1 > character that is either an a-z or a number. > You are right, after a few other tests I stand corrected. My apologies. However according to the documentation for filter_var() and the PCRE regexp option if it returns false, which it is, this is indicating an error with the regex. In addition to this I would like to point out that the same regex using the older preg_match() function works as it should while the character class following by the pattern (+) fails the validation portion of the regex. print_r(var_dump(filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=>array('regexp'=>'/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di'))))); returns false (invalid regex) when using the character matching class [a-z0-9]+ with the filter_var() function with the FILTER_VALIDATE_REGEXP option print_r(var_dump(preg_match('/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/i', '09VolunteerApplication.doc'))); return int(1) indicating a valid regex as well as a valid match. I believe this should be reported as a bug but I appreciate your assistance and insights. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php