On Mon, Jun 7, 2010 at 8:20 AM, Paul W. Frields <stickster@xxxxxxxxx> wrote: > Including Infrastructure team gurus on this email thread for > additional expert advice and assistance. :-) Setting reply-to docs@ > list. > > On Sat, Jun 05, 2010 at 05:55:05AM -0400, Eric Sparks Christensen wrote: >> On 06/05/2010 03:14 AM, Ruediger Landmann wrote: >> > One of the new features of Publican 2.0 that I haven't mentioned yet is >> > that it creates an XML sitemap for search engine bots to crawl. You can >> > find d.fp.o.'s sitemap here: >> > >> > http://docs.fedoraproject.org/Sitemap >> >> Awesome. >> >> > >> > I've fed this to Google, Yahoo, and Bing, and they're all slowly >> > re-indexing the site. The map now contains a little over 2,000 URLs and >> > at the time of writing, Google has crawled about 350 of them. >> >> I know that Google has some algorithm that figures out how often your >> site changes and then crawls more or less frequently. Not sure if we >> could work with Google on scheduling this more or less around the time >> of a release. Of course I'm guessing that we won't have this big >> re-structuring next time, either. >> >> > >> > The dilemma we face is the decision of when to turn off the 404 >> > redirect. For the sake of all the existing links scattered around the >> > net (both on the Fedora Project site and off it), we'd want to postpone >> > this as far as possible. On the other hand, any bot attempting to verify >> > that link gets a page served up and probably concludes that the link is >> > valid; I suspect that if these links 404ed, they'd start to evaporate >> > from search results. >> >> Isn't there a type of redirect (302?) that tells you that you are being >> redirected so you don't think the URL is valid? > > I thought that 301 and 302 both do this, but 302 generally is used for > temporary redirects, and 301 for permanent ones. So 301 would be the > one you're thinking of here perhaps? > It's 301. RFC2616 has the details: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3 >> > Given that existing links around the net are pointing to (at most >> > recent) the F12 versions of docs, there will be no need to keep the 404 >> > redirect in place past October; however, if we want to start allowing >> > dead links to 404 out rather than poison search results, maybe we should >> > bring that date forward? The sooner we do this, the sooner search will >> > start working properly... >> >> Yeah, the sooner the better, IMO. > > I wonder how long you would need 301's in place before it's safe to > remove them? Because I think Infrastructure's not keen on maintaining > a big list of these. > I'm not aware of any specific time frame defined by the RFCs or other standards. Each site has a different policy for redirect maintenance. There are 3 basic options: 1. Watch the access logs for a majority of the access requests to use the new URL. When the pre-determined threshold is met, remove the redirect and accept that some users who still haven't updated their links will receive a 404 (Object Not Found). 2. Set a date-based transition period. After the specified date, either remove the redirect, or apply the logic #1 and bump out the cut off date accordingly. 3. Keep the redirect indefinitely, or until the maintenance costs are too cumbersome. Many organizations use this one, for better or worse. > -- > Paul W. Frields http://paul.frields.org/ > gpg fingerprint: 3DA6 A0AC 6D58 FEC4 0233 5906 ACDB C937 BD11 3717 > http://redhat.com/ - - - - http://pfrields.fedorapeople.org/ > Where open source multiplies: http://opensource.com > _______________________________________________ > infrastructure mailing list > infrastructure@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/infrastructure > ---Brett. -- docs mailing list docs@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe: https://admin.fedoraproject.org/mailman/listinfo/docs