It appears that Michael Richardson <mcr+ietf@xxxxxxxxxxxx> said: >It's good to see robots.txt coming to the IETF. Agreed, also agree with Mnot's question whether we have reports from other search engines that they follow this spec. Based on my experience looking at my web server's log files and tweaking the robots.txt files and looking at the web sites where they explain their crawling practices, I think they do, but surely we know people at a few other search engines. I'd be particularly interested to hear who interprets the * and $ pattern metacharacters. Section 2.2.2 has this example of a path with a Unicode character: | /foo/bar/U+E38384 | /foo/bar/%E3%83%84 | /foo/bar/%E3%83%84 | There is no U+E38384 character, but the UTF-8 version of the Japanese character U+30C4 is hex E3 83 84 so I'm guessing that's what they meant. The "Crawl-Delay" line is ignored by Google but followed by many other search engines such as Bing and Yandex. I would describe it, with a note that only some spiders use it. Most importantly, the copyright license is broken. At the top it has the "no derivatives" license, which is fine, but it also has code sections in <CODE BEGINS>. The TLP specifically says that the code license only applies RFCs that use the regular license, not any other license. In this case the "code" sections are short snippets of sample robots files with made up names and paths so I would take out the code flags. R's, John -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call