Re: Audio CAPTCHA review request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, March 29, 2007 4:50 pm, tedd wrote:
>>2. What you've created is a relatively simplistic audio captcha that
>>HAS to be really succeptible to speech recognition.  Spammers have
>>gotten used to visual CAPTHCA so maybe they're not going to focus
>>too much on detecting and breaking audio CAPTCHA, but that still
>>comes down to "security through obscurity" which isn't a good
>>practice.
>
> There isn't any good practice here -- it's all just an attempt to do
> "the best the media will permit".

Understood.. it's a good exercise and a good discussion I think.

>>Once they had the software set up. Then they just have to fake the
>>"Speak Key" submit and grab the "tmp/access.mp3?##########" out of
>>phone.php (submitting proper cookie/session data) and that's it.
>
> Two things:
>
> 1. There's no cookie data -- how does one access session data? I
> thought outside of the sessionID, you couldn't -- am I wrong?

Sessions, unless you pass the ID through the URL, require cookies to be
enabled on the client's browser.  They're not permanent cookies, but
they're still cookies.  I was just saying that the attacking spam bot
would be manually emulating a browser by accepting the session cookie and
passing it back and forth with the server (or however that technically
works) to maintain the "this session id goes with this MP3 for CAPTCHA
authentication".  Otherwise there's no persistence and the CAPTCHA
mechanism would probably try to issue a new MP3 when the bot when to log
in.

Challenge: sessionid 123 issued, MP3 "ABC" created
Response: "ABC" sent back with no session ID, CAPTCHA assumes it's a new
visiter and generates sessionid 456 and MP3 "DEF".

Challenge response fails because response was not sessionid 123 and text
"ABC".  It was probably text "ABC" with no session id if the bot doesn't
try to emulate browser behavior for sessions.


> 2. I might be able to generate a sound file that can be accessed only
> once. In other words, once you grab the file it's not there for a
> second look (like is light a wave or particle thing). Now, put that
> together with a hidden token in the form that accompanies the key,
> then even typing the correct key wouldn't work unless it was
> submitted via the form and not injected. I have to think about the
> logic here -- but this is just of the top of my head.

Anything stored in the form can be read by a bot, so whatever additional
tokens you may put into the web form could also be read and posted back to
your web server by the bot.

Also, regarding accessing the file only once, a bot is either going to get
it or not get it in one pass.  So it doesn't matter if the audio file you
create can only be accessed once.  That's all a bot needs and it's either
successful or not.  A human, on the other hand, may need to hear it a few
times.

Sounds like you have a concept on the tip of your brain, so maybe this
isn't exactly what you were aiming for.  But those are my thoughts based
on what you said.

And let's not bring quantum mechanics into this mess. hah. you and your
wacky waves and particles.

"Checking the electron microscope... And the winner is three, in a quantum
finish!"
"No fair! You changed the outcome by observing it!"
- Futurama

>>And because you can't do anything on the internet without bumping
>>into adult material. Don't worry, this is safe... no pics or bad
>>words, just an article about using porn sites to break visual
>>CAPTCHA.  The spambots would take your visual CAPTCHA images and
>>post it to their site which offers users free porn if they pass the
>>CAPTCHA. And there's no lack of people wanting free porn so sounds
>>like it was fairly effective:
>>http://www.boingboing.net/2004/01/27/solving_and_creating.html
>
>
> Now that is clever. However, I am having difficulty seeing just how
> they can obtain and use the information provided. For example, if I
> say the key for a specific CAPTCHA is 123 -- then how can that help a
> spammer because when he returns to the site, the CAPTCHA would have
> changed?
>
> Can you explain how that works?

Because computers are very very fast.

More importantly, they don't have to "return to the site" therefore
generating a new CAPTCHA sequence.  The bot can access the page it wants
to log into, retrieve the CAPTCHA image, post it to the free porn site for
all the amazingly fast 1 handed typists to decode, and respond to the
CAPTCHA challenge within second, if even that long.   Assumping there are
humans accessing their site to do the decoding for them.

It's no different than pulling up a message in french, copying and pasting
it into Babelfish in another window, reading the translated version and
going back to the first window to respond.  Hell, toss in a english ->
french translation of your response before switching back to window 1 and
if a computer was doing it all, it could have it all done in a fraction of
a second longer than it took to load the pages.  Plenty of time to respond
to the CAPTCHA challenge.  But in this case, the computer just has to read
image, paste image, receive decoded response, submit code.


I'm guessing it's not very speedy, but highly effective.  Tricking humans
into answering turing-style tests for the machine.

> Not as hard as you might think. You don't have to identify it as a
> pig but rather as the spectral properties that a pig image displays.
> It's like part recognition on an assembly line.

Are you incinerating pigs and doing spectral analysis on them to see what
they're composed of, again?

Identification relies a lot on shape and color.  Change the colors to
something atypical of the subject and that removes one factor.  Mutate the
shape and it makes it harder.   If you want to get really fancy, you could
probably create a shape out of color variances then have a more distinct
shape as a "red herring" for the bot to identify.

>>http://www.espgame.org/
>
> That's more the brute force method -- but at some point, it would
> probably work.

And the likelihood of the exact image from the CAPTCHA being in that
database could be pretty slim, unless everyone uses the same stock images.
 But a good bot might be able to draw correlations between similar images
and use it to narrow the possible responses down to a dozen instead of
millions.  But yeah, brute force to some degree.   But that's what they're
doing with those massive MD5 hash databases.  If you can't easily decode
something, try every possible combination until something sticks.

Trivia: FedEx and UPS boxes use Simplex locks.  Typically 5 digits that
can only be pressed once each. You can press multiple buttons at once
though, adding to the permutation a little bit.  But in the end, this
still only provides roughly 1000 unique combinations.  The average child's
Master Lock has over 64,000 combinations.

I drop my packages off at the store. :)

-TG

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux