Re: Connect to Google

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2012-02-15 at 21:56 -0500, John Taylor-Johnston wrote:

> I'm a teacher. I want to use PHP to interface with Google and see if a 
> student has plagiarized.
> 
> I don't see many open-source projects on the subject, so I want to 
> create my own script.
> 
> How can I use PHP to interface with Google and see if this text exists 
> on the internet?
> 
> If this is possible, I need some ideas on how to parse the text and 
> input it into Google.
> 
> Then I might like to get a percentage idea of how this text compares to 
> a site that Google has indexed.
> 
> 
> $SampleText = "Lorem ipsum dolor sit amet, test link adipiscing elit. 
> Nullam dignissim convallis est. Quisque aliquam. Donec faucibus. Nunc 
> iaculis suscipit dui. Nam sit amet sem. Aliquam libero nisi, imperdiet 
> at, tincidunt nec, gravida vehicula, nisl. Praesent mattis, massa quis 
> luctus fermentum, turpis mi volutpat justo, eu volutpat enim diam eget 
> metus. Maecenas ornare tortor. Donec sed tellus eget sapien fringilla 
> nonummy. Mauris a ante. Suspendisse quam sem, consequat at, commodo 
> vitae, feugiat in, nunc. Morbi imperdiet augue quis tellus."
> 
> John
> 
> 


Wow, that's a pretty big project you're chewing there. A quick search
shows that there are some project out there to detect plagiarism, but I
think for university calibre there's a hefty sum of money required.

To get a rough idea, you could break a text into sentences, and then
query each one of those to see if it occurs just like that. You can use
cURL to grab search results pages for this sort of thing, no need for a
special interface. There are a few things to bear in mind though:


      * Googles terms and conditions may prohibit using their search
        engine like this, or may impose a limit on how much you can do
        this
      * Some sentences will be intentionally copied, as quotes. Maybe
        some sort of check against the source to see if it's in a quote
        context.
      * What if only part of a sentence is copied?


Maybe after you've searched for exact matches from the sentences in the
source, you could remove them from the source, then re-check every
sentence against Googles fuzzy search. It may produce many false
positives though.

There are plenty of other factors too, such as students copying from
books which don't exist in a search engines archives, some subjects may
unintentionally result in the same way of wording, particularly
technical subjects which tend to be removed from more creative and
flowery descriptive tendencies.

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux