--On Friday, November 23, 2018 07:57 -0700 Doug Royer <douglasroyer@xxxxxxxxx> wrote: > On 11/23/18 3:29 AM, Stewart Bryant wrote: >> https://datatracker.ietf.org/stats/document/authors/ >> >> Why do such a high proportion of our documents (for example >> 1929 RFCs) have no authors? > > Well, RFC-1929 does have an author. So I am guessing the > automated tools can not (or did not) parse the older text only > documents. A different guess would be that whatever tool/ algorithm produces this graph counts a document with "only" an editor as having no author. If that is the way things are counted, this sort of statistic would not be surprising. Indeed, if documents that came out of a WG and that were ultimately compendiums of input from many WG participants were identified as having an editor and not an author or handful or authors, I'd expect 16.79% to be somewhat low and hope 25.21% (of RFCs only) would be low too. Doug, if the problem were "text only", then one would expect a much larger number. If you intended "XML available" then that wasn't defined until RFC 2629 and, IIR, the RFC Editor didn't start accepting the XML files, much less archiving them, until much leter. If it were "XML or nroff", I don't know -- it might depend on whether the documents that were submitted/archived on paper and then scanned and converted passed through an nroff page. More important to this little detective job, if one adds up the numbers in the right column of the "RFCs" tab, one ends up with 8311, a fair approximation to the largest RFC number as of this morning (8521), and a closer one if the number "not issued" (79) is subtracted (8442). Could the difference of about 210 be documents that have been issued numbers but are still in the publication queue? I don't know, but, given the highest issued numbers are 8496, 8505, and 8521, it doesn't seem entirely implausible. Similar comments would apply to I-Ds: as far as I know, it has never been possible to post one without an identifiable author or editor. There are definitely a few pseudonyms but those are still authors for the purpose of this type of count Possibly something slipped through the cracks, but I'd expect that number to be in single digits. Moreover, counting an RFC as having "no author" when it was really "not parsed" would be seriously irresponsible and I would not expect that of the tools team. FWIW, I would expect a page like these to show the date compiled (perhaps there was a lag between the RFC list or I-D list and the compilation date/time) and exactly what is reported as "0 authors". --your friendly statistical detective