[Hotplug_sig] Re: [Plm-devel] Re: PLM patch spider

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 27, 2005 at 03:40:49PM -0700, Dave Hansen wrote:
> On Mon, 2005-06-27 at 15:37 +0000, Bryce Harrington wrote:
> > On Thu, Jun 23, 2005 at 02:13:48PM -0700, Judith Lebzelter wrote:
> > > 
> > > 
> > > On Thu, 23 Jun 2005, Dave Hansen wrote:
> > > 
> > > > On Mon, 2005-05-02 at 13:41 -0700, Judith Lebzelter wrote:
> > > > > We have a cron job that will hit the patch directory once every three 
> > > > > hours to check for new patches.  It also pulls a patch if it finds one.  
> > > > > This is the same schedule we use for other kernel patches.
> > > > 
> > > > Is there a chance that you guys could update your PLM fetcher a little
> > > > bit?  It likes to go looking for files that aren't actually present on
> > > > my web server,
> > 
> > Which files are those?  I can probably filter those more tightly in
> > package_retriever.
>   
>        URL (426)
>        Error Hits
>         Referers
> /patches/2.6.11/patches/
> 457
> -
> /patches/2.6.10/patches/
> 457
> -
> /patches/2.6.12/patches/
> 457
> -
> /patches/2.6.8/patches/
> 457
> -
> /patches/2.6.9/patches/
> 456
> -

Okay, I've found what is causing this and implemented what I think is a
fix.  I'll test it out tomorrow, but let me know if it reoccurs.

> > I've updated the script that makes the lwp agent string to use curl and
> > to print an agent string like this:
> > 
> >   package_retriever/1.00 <hostname> spider <descriptive-comment>
> > 
> > I'm going to add a few more changes to this script, so it may be a few
> > days before I am able to switch over to it.
> 
> Putting the PLM url into <descriptive-comment> would be exceedingly
> informative :)

It's a good idea.  I think Judith is going to do that for the PLM
script.  For the package_retriever process, I'll put a link to where it
is going to be posting the regression tests.

I'm probably also going to throttle package_retriever back to only pull
daily.  That'll delay when the regression tests get invoked, but
that's probably going to be okay.

> > Also, are you sure it's reporting lwp-trivial?  I was actually using
> > lwp-simple.
> 
> I haven't tracked it down  There appear to be hits from both, although
> the lwp-trivial/1.40 references seem almost a month old.

Okay, that's probably something different then.

> > > > Also, it would be kind if they obeyed robots.txt, or at least fetched
> > > > it.  My log analyzer will detect robots based just on fetching
> > > > "robots.txt" when beginning a crawl.
> > > 
> > > We should be obeying robots.txt.
> > > 
> > > We will be able to do these updates but it may take a little while to get 
> > > to them and deploy.
> > 
> > I've implemented robot support for package_retriever.  Doesn't look like
> > I navigate through the directory that robots.txt is blocking, but I
> > guess this'll give you the ability to control what the script spiders.
> 
> Thanks for doing all of this.  

Sure thing.  :-)

Bryce

[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux