On Thu, March 29, 2007 3:45 pm, Tijnema ! wrote: > You're maybe on the right path, adding images as the background makes > it really hard to read the code from the image. You could for example > use random images as background. Some of the CAPTCHA methods listed as being difficult for the PWNtcha guys to break involved using a variety of colorful backgrounds and other strong mutations. The stronger the mutation and harder for a machine to read, tended to also mean that humans had a harder time reading it as well. The idea is to break the pattern enough to make it difficult on machines, but easy on people. But then we fall into the same old conundrum of better security versus a system that's more difficult/cumbersome/tedious/annoying to the user. As always, we have to find a balance between "good enough" and "easy enough". > But i have to say that breaking something isn't needed always, > re-using a human passed protection is a way to break through a lot of > things. > > For example, i would go to the page and save the number that the > CAPTCHA passed to my session. Then i would write down the code that i > need to enter. So, next time i need to pass, i set the session value > to the one i got first time, and i enter same code. Works for most > CAPTCHA programs :) Didn't test it out on your audio CAPTCHA yet, but > you really should care about a timeout for the session variable used. Are you saying that you'd want to make a note of the session ID, the "filename" for the audio file, listen to the audio then write down what it says so if that combination ever came up again, you'd have the answer? Couple of problems with this if that's what you're proposing: 1. Sessions usually time out. Ideally, you shouldn't be able to recall a session ID used a month ago and have it work. If the server kept every session ID ever created, it'd become a mess really fast. And if the programmer stored the session ID in a database and fails to create and store a new session ID when you re-visit, then that's a pretty big gaping hole. If it's part of a security mechanism and they don't time out or someone expire and get purged, chances are you should be looking for another job. 2. The "filename" number is most likely randomly generated and stored temporarily for use at that moment. So re-visiting the page and getting the same audio CAPTCHA sound clip probably won't give you an audio clip with the same "filename". Again, ideally. What you might be able to do is do an MD5 on the file you get and if it matches a previous audio clip, then that may work. All depends on how the sound file is generated and if it produced EXACTLY identical files using the same digits or if there's a slight variance. Much like the visual CAPTCHA devices, audio ones are going to work best if the pattern of audio is broken up somehow with additional noises injected into the mix (I'm paraphrasing something I saw on one of those sites I saw earlier.. but it's a really good point). Straight, plain, measured voice is going to be a lot easier to parse than voices with mixed pitches, volumes, accents and some background noise or something. Something like power tools, a vacuum cleaner, city sounds, etc. Things we, as humans, can conciously filter out fairly easy most of the time, but a computer would have a really hard time figure out "2 5 3" with a jackhammer and car horns going on behind it. That gets into some seriously sophisticated audio processing. Anyway, adding some mutation to the audio file would prevent an MD5 type hash check. Another potential attack on weak audio mechanisms just occured to me. Load the page a number of times, saving the audio files each time until you can determine what the "set" is that's being used. If it's all numbers, numbers + letters, etc. Eventually you should have a copy of each sample. If the spacing between each digit is regular, theoretically you could create a sound file for each digit and do a brute force compare of the new sound clip against every possible combination of the files you saved stacked together. Or again, create an MD5 hash for each combation and just do a lookup against the MD5 of the new audio clip. Probably wouldn't take long to get the fine-tuning down. But even a tiny bit of variance would blow that out of the water. Speech recognition tools are much more of a threat than something like this. -TG -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php