Re: Mirroring/caching PHP webpages.

Paul M Foster <paulf@xxxxxxxxxxxxxxxxx> · Sat, 17 Jan 2009 23:26:23 -0500

On Sun, Jan 18, 2009 at 01:03:44PM +1100, Clancy wrote:

> On Fri, 16 Jan 2009 00:51:58 -0500, paulf@xxxxxxxxxxxxxxxxx (Paul M Foster)
> wrote:
> 
> >On Fri, Jan 16, 2009 at 11:57:24AM +1100, Clancy wrote:
> .................................
> >> The only explanation I can see is that someone has somehow managed to
> >> cache or mirror the
> >> version 1 logic, and is still dutifully stuffing pornography into it. As
> >> it is my
> >> understanding that the PHP code which handles the processing is
> inaccessible
> >> to the user,
> >> I cannot understand how this could have been done.  Does anyone have
> >> any suggestions?
> >>
> >
> >If Google can spider and read your site, why can't someone else? I've
> >had similar things happen. Any program that uses the HTTP protocol to
> >fetch your site will only get the page as rendered by the server-- sans
> >PHP. But I can imagine someone else programming something to snag the
> >page a different way-- *with* PHP.
> >
> >But actually, they don't even have to be that sophisticated. All they
> >have to do is submit a message to your form the first time, note the
> >variables and their characteristics, and then resubmit that same type of
> >content later using the same variable names and characteristics.
> >
> >Here's something you might do:
> >
> >1) Rename the page in question. That way their submission won't
> >piggyback on your existing PHP code.
> >
> >2) Change all the variable names in the file.
> >
> >Chances are, they're just submitting an HTTP request with the proper
> >POST/GET variables so your page processes it as though it were being
> >accessed "live". But if they try to submit this same content to a form
> >that goes nowhere, Apache will just give them a 404 error.
> >Alternatively, if you change your variable names and they submit to your
> >existing form, your PHP can simply ignore it.
> >
> >Also, you might try CAPTCHA (look it up). It tries to weed out human
> >from non-human surfers. You've probably got a 'bot submitting to you, so
> >this might help.
> 
> The page has text boxes for the name and e-mail address, a text area for
> the message, and
> a submit button. When the user hits the submit button the original code
> evaluates all the
> inputs, and either re-issued the page if it didn't like them, or transmits
> the message,
> with the title "Feedback from XXX website". If the message is transmitted
> successfully the
> user is then shown a "Thank you for your feedback" page.
> 
> In version 1 of the modification if the message passed the initial test
> I then submitted
> it to a second test. If it failed this, I replaced the message with
> "[Censored]", and sent
> it to an alternative address, with the title "Rubbish from XXX website"
> but showed the
> same "Thank you" page. I did this just so that if I accidentally rejected
> something from
> someone I knew, I could email them and ask them to send the message again.
> 
> After a few days I decided I didn't need to know anything about these
> bogus messages, so I
> developed version 2 of the modification. This is the same as version 1,
> except that it
> uses a different title and replacement message (even though they are no
> longer used), and
> simply discards the message, but again shows the normal "Thank you" page.
> 
> With either modification there is nothing to tell the sender that their
> message has been
> rejected, and, as I never reply to such messages, no way for them to find
> out whether or
> not I actually received their message. Whenever I try to send myself a
> bad message nothing
> happens, so that version 2 appears to have been implemented.  I do not
> get any uncensored
> messages of this type now, so the rejection algorithm is satisfactory,
> but I'm still
> getting one or two messages handled by version 1 each day.
> 
> I cannot see how this could happen unless someone has somehow managed to
> trap the version
> 1 PHP code (or, just conceivably, my provider switches to a backup containing
> an old code
> at some stage in a maintenance cycle).

I'm not an expert at the HTTP protocol, but here's what I expect is
happening: Clearly, someone captured at least the form responses from a
release 1 version of your website. So now they're sending an HTTP header
with the various proper fields, along with POST content which conforms
to your original form. They don't need any forms or actual code to do
this. They have a bot which just sends the proper responses. It would be
like if someone showed up on your website and filled out the form and
pressed the "submit" button. In that case, the client sends a response
to the server based on the user submission. The server, in accepting
this response knows (virually) nothing about where it came from or what
to do with it. It's just a client response. It decodes the header info
and the POST fields, and decides from there how to handle it. Normally,
this would be to present a new page acknowledging the input.

The point is that all your server is getting is a header and some POST
variables from some IP somewhere. You could check your server logs for
the offending IP(s) if you could pinpoint when a form like this arrived.
But it's entirely possible that the IP shifts over time.

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php