On Mon, Jun 27, 2005 at 03:40:49PM -0700, Dave Hansen wrote: > On Mon, 2005-06-27 at 15:37 +0000, Bryce Harrington wrote: > > On Thu, Jun 23, 2005 at 02:13:48PM -0700, Judith Lebzelter wrote: > > > > > > > > > On Thu, 23 Jun 2005, Dave Hansen wrote: > > > > > > > On Mon, 2005-05-02 at 13:41 -0700, Judith Lebzelter wrote: > > > > > We have a cron job that will hit the patch directory once every three > > > > > hours to check for new patches. It also pulls a patch if it finds one. > > > > > This is the same schedule we use for other kernel patches. > > > > > > > > Is there a chance that you guys could update your PLM fetcher a little > > > > bit? It likes to go looking for files that aren't actually present on > > > > my web server, > > > > Which files are those? I can probably filter those more tightly in > > package_retriever. > > URL (426) > Error Hits > Referers > /patches/2.6.11/patches/ > 457 > - > /patches/2.6.10/patches/ > 457 > - > /patches/2.6.12/patches/ > 457 > - > /patches/2.6.8/patches/ > 457 > - > /patches/2.6.9/patches/ > 456 > - Okay, I've found what is causing this and implemented what I think is a fix. I'll test it out tomorrow, but let me know if it reoccurs. > > I've updated the script that makes the lwp agent string to use curl and > > to print an agent string like this: > > > > package_retriever/1.00 <hostname> spider <descriptive-comment> > > > > I'm going to add a few more changes to this script, so it may be a few > > days before I am able to switch over to it. > > Putting the PLM url into <descriptive-comment> would be exceedingly > informative :) It's a good idea. I think Judith is going to do that for the PLM script. For the package_retriever process, I'll put a link to where it is going to be posting the regression tests. I'm probably also going to throttle package_retriever back to only pull daily. That'll delay when the regression tests get invoked, but that's probably going to be okay. > > Also, are you sure it's reporting lwp-trivial? I was actually using > > lwp-simple. > > I haven't tracked it down There appear to be hits from both, although > the lwp-trivial/1.40 references seem almost a month old. Okay, that's probably something different then. > > > > Also, it would be kind if they obeyed robots.txt, or at least fetched > > > > it. My log analyzer will detect robots based just on fetching > > > > "robots.txt" when beginning a crawl. > > > > > > We should be obeying robots.txt. > > > > > > We will be able to do these updates but it may take a little while to get > > > to them and deploy. > > > > I've implemented robot support for package_retriever. Doesn't look like > > I navigate through the directory that robots.txt is blocking, but I > > guess this'll give you the ability to control what the script spiders. > > Thanks for doing all of this. Sure thing. :-) Bryce