Re: Audio CAPTCHA review request

jochem <jochem@xxxxxxxx> · Thu, 5 Apr 2007 10:20:27 -0700 (PDT)

tedd wrote:
> 
> However, I'm not convinced that all sound methodologies can be 
> resolved as simply as that. For example -- your method looks for 
> pauses/high points and then capsulizes segments for comparison 
> against known sounds. That's OK, but what if there is other meaning 
> in the sound?
> 
> I often wondered why simple CAPTCHA's like "Type the number seven 
> four three", or "What is the sum of two plus three?", or "Spell cat", 
> or "Spell two"  wouldn't work? Certainly, one can create a routine 
> coupled a dB to randomly produce thousands of different combinations 
> of simple questions. Likewise, a sound file could be produced the 
> same way.
> 
That will not help much. I have seen this suggestions a lot, see for
example: 
http://www.standards-schmandards.com/2005/captcha/ for a nicely written one.

But it makes breaking the captcha a two-step problem. First use speech to
text to make a sentence and then 
parse the grammar. Both problems are studied a lot and have lots of
solution.

The segmentation I use in devoicecaptcha is very naive. I agree, but it
works! There are however better (and more complex) segmentation algorithms
readily available. To break your suggested captcha you just use
devoicecaptcha, but you also train the extra words into the model. So
besides statistics for '1' '2' '3' etc. you also add statistics for the word
'+', 'type' etc. Then you transcribe the voice to text ('add' '2' '3') and
parse that text output for example with a BNF parser. That solves the
problem and gives the solution '5'. 

What can you do to make audio captcha's harder? Add more voice! This is
exactly what google has done on their updated audio captcha. This really
helps, you need a much more fine grained and larger voice model to trancribe
that. I still think it is doable, but the amount of training work that is
involved scares at least me away from actually doing it. 

This is the same for the latest image captchas, trying to segment them is
hard (matching the broken segments  to determine the charachter with a
statistical model is relatively easy).

tedd wrote:
> 
> Perhaps I'm underestimating the capabilities of bots and 
> overestimating the abilities of humans. I suspect that the 
> distribution of both camps have an overlap and therein lies the 
> problem. The problem may not have a solution.
> 
> But to bring this back to my intent -- my intent here is to provide a 
> simple audio CAPTCHA that could be used by anyone to provide some 
> degree of protection for their personal use THAT would also be 
> accessible to screen readers. It's not foolproof, but it appears to 
> work in that regard.
> 
I think any captcha that is different from a standard library one will help,
you should just know that if someone is really convinced to break it, he/she
can. So think of a captcha and implement it quietly (no bragging how good it
is, that will draw the wrong attention). Standard bots will not be able to
parse it and only if you have a high profile site it will be economally
viable for spammers to break it. 
-- 
View this message in context: http://www.nabble.com/Audio-CAPTCHA-review-request-tf3487541.html#a9859801
Sent from the PHP - General mailing list archive at Nabble.com.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php