Re: Google Scholar, was How to pay $47 for a copy of RFC 793

Harald Alvestrand <harald@xxxxxxxxxxxxx> · Tue, 10 May 2011 22:53:17 +0200

On 05/10/2011 10:08 PM, John C Klensin wrote:

--On Tuesday, May 10, 2011 20:22 +0200 Harald Alvestrand
<harald@xxxxxxxxxxxxx>  wrote:

If only there was someone who worked at Google on this list
who could send an internal message to get this rectified....
:-)
   From what I could tell from the instructions, Scholar is
using some heuristics to figure out that "this is a paper" and
"this is not a paper". The highest one on the list was a
3-slide presentation that really didn't say very much - I
think this is one where heuristics had failed.
I think someone at the site could help them a lot more.
Harald,

I'm not sure what you mean by "someone at the site".  Certainly,
various of us could explain to them why the series should be
more comprehensibly indexed.  But with Maps as a notable
exception, I've found that suggesting that a particular
heuristic is failing, or that something should have been indexed
that isn't, is most likely to get a response whose essence is
the Google folks and their algorithms are ever so much smarter
then us lusers, so what could we possibly know?
The instructions at Scholar were pretty comprehensive and specific:

- Make either your abstracts or your documents into HTML
- Put a very specific selection of tags into your documents
- Report your collection to the Scholar robot

We can either ignore this particular set of instructions, and get the 
result that the heuristics generate, or follow this set of instructions, 
and hope for a better result.

My point (if I have any) is that those instructions should be easy to 
follow for the people who control these sites, but are not so easy for 
anyone else (unless they want to act as if they are an "official mirror").

That puts the ball in the RFC publisher's court.
Of course, my personal heuristic, and that of many folks I know
who use Scholar much more intensely than I do, is that if a
Scholar search fails or produces nonsense, I go to the
general-purpose search engine.   For RFCs, it tends to do very
well, both at finding the right stuff and at ranking the RFC
text itself near the top.

So, other than being lazy about not doing the second search,
pedantic about what Scholar should be indexing and how, or
demanding and expecting a more perfect universe, I'm not sure I
see a real problem in this.

     john

_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf