Re: [Last-Call] Last Call: <draft-koster-rep-06.txt> (Robots Exclusion Protocol) to Informational RFC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mark,

Some comments in-line.

On Tue, Mar 8, 2022 at 11:20 AM Mark Nottingham <mnot@xxxxxxxx> wrote:
Hi Ted,

Thanks for the response, especially since this is not strictly your problem any more. Responses below.


> On 8 Mar 2022, at 8:18 pm, Ted Hardie <ted.ietf@xxxxxxxxx> wrote:
>
> Hi Mark,
>
> On Mon, Feb 28, 2022 at 10:56 PM Mark Nottingham <mnot@xxxxxxxx> wrote:
>>
>>
>> > On 1 Mar 2022, at 9:29 am, John Levine <johnl@xxxxxxxxx> wrote:
>> >
>> > Most importantly, the copyright license is broken. At the top it has
>> > the "no derivatives" license, which is fine,
>>
>> Ah - I missed, that, thanks for pointing it out.
>>
>> I'm uncomfortable leaving change control for a key interoperability mechanism in the search market in the hands of one competitor, yet blessing it as part of the IETF stream. I think the IETF as a whole should be uncomfortable with that too, given current competition enforcement trends.
>
> Having the original author of the spec be the principal author here is a bit of a bulwark against that, as I don't believe he is or would be interested in handing change control over to Google. I also believe Gary and the other authors have reached out to the rest of the relevant community (though my change in employer means I know longer have the relevant e-mails to cite).

Hmm. Because Google employees are co-authors in a joint work, my understanding is that they have the ability to publish derivative works in the future, at least in the US. If that's true, it does give them a form of change control -- or at least significant privilege (regarding the future of the spec) over other members of the community.

 (Obviously, one would need to talk to a copyright lawyer -- my understanding here is informed by a similar situation in a different venue)

I am also not a copyright lawyer, and my understanding of joint authorship is probably poor.  But I am pretty sure that deriving the right to create derivative works isn't the aim here.  The aim was, in fact, to get a stable reference for this and to make sure the bar for changing it was significant.  The ISE generally does not publish updates to IETF-issued documents (though they do publish commentary), and so going with the IETF stream here is actually likely to make it harder for any later modification to go forward without community attention (without leaving the RFC publication process entirely, obviously).


> On the more general topic of why this has the "no derivatives" clause,  I understand your reluctance, but I think this is a case where the combination is valid.  First, it's important to note that the specification was brought to the IETF for substantive review, to make sure that the elements it uses (like ABNF) were being used in the right way and to eliminate any possibility of ambiguity.  From my perspective, that's been very useful and it would not have occurred to the same extent had this gone directly to the ISE.

I find this a bit surprising -- surely it's possible to get adequate review for ISE documents without putting them into the IETF stream? Otherwise, the Independent stream would jeopardise the quality of the RFC Series overall... Was the ISE involved in this discussion?


I didn't mean to imply that ISE documents weren't generally adequate, but is the ISE's choice about what review to seek.  Seeking public review was part of the point here, and I continue to believe that's beneficial.

 

> However, this spec reflects operations which have been stable/backwards compatible for a very long time.  Given that, it is important to the community which deploys this that it be fairly difficult to amend.  One way to achieve that would have been to make this standards track; that would require standards action to update or obsolete it later.  When we discussed that back at the beginning of this process, though, it was pretty clear  that some folks would use the working group discussion around that to try to insert functionality that would result in breaking changes.  While it would have been kind of unlikely for any of those to win out against the need for maintaining interoperability, the result would have been a pretty big increase in the amount of effort needed to get this published.

This is the rub -- depending on your definitions of "the community" and "some folks" in the statement above, the outcome might be completely reasonable and justified, or blatantly illegitimate.

As I said, I don't have the ability to cite previous discussions, but I have asked the authors to reach out to the folks from that community so that they can comment directly.  I hope that their doing so resolves this issue for you.

 
It's also notably out of step with the direction that pretty much all other Internet and Web standards are taking. HTML, DOM, and many other aspects of the platform have considerable requirements for stability and backwards compatibility, and yet they are not locked behind a no-derivatives clause.


Of course, the parallel to the robots.txt spec may or may not be strong, given both when it was developed and the extent to which it was fostered by a single individual. 
 

> Another option for getting an archival spec with a high bar for change was this one:  an IETF informational with a no-derivatives clause. That gave the full benefit of IETF review and made the bar for amendment high  enough to allay the concerns of the original author and the relevant community.  It had this clause when Adam agreed to sponsor it and it has had it in every iteration since, so I thought this was well understood.  As shepherd, my apologies if it was not.
>
> There is another option that gets the full set of characteristics needed:  AD sponsored on the standards track.  At the time this went through the first set of discussions that was something folks had become very reluctant to do.  If it is on the table, I personally believe that a standards track document with the usual clauses would work as well.  Those can't be superseded or amended without serious work and plenty of time for the relevant community to chip in. 
>
> But, absent that, I think this kind of document is why BCP78 permits this combination: documents which need and have received significant IETF review but which also have a significant external community for whom the usual clauses result in a risk of inappropriate later amendments. To put this slightly differently, I think you'll see that this falls under the logic in RFC 5378, Section 3, in the penultimate paragraph.

Assuming that you're referring to this paragraph in s 3.3:

~~~
   The IETF has historically encouraged organizations to publish details
   of their technologies, even when the technologies are proprietary,
   because understanding how existing technology is being used helps
   when developing new technology.  But organizations that publish
   information about proprietary technologies are frequently not willing
   to have the IETF produce revisions of the technologies and then
   possibly claim that the IETF version is the "new version" of the
   organization's technology.  Organizations that feel this way can
   specify that a Contribution be published with the other rights
   granted under this document but may withhold the right to produce
   derivative works other than translations.
~~~

... then the question is whether the robots.txt format is really 'proprietary' technology, or whether it's a public good. Given its wide deployment and use as what amounts to an API for search engines to interoperate with Web sites, I struggle to see it as the former.


I think your reading of this is stricter than mine.  The paragraph says "even when the technologies are proprietary", but I do not believe that restricts the scope of the exception to those which are.  In this case, as in the proprietary case, there is a concern about new versions which are not backwards compatible without a pretty high bar for review.  This paragraph makes it possible to use the no-derivatives clause in cases like this. As I said in my previous message, I believe that a standards-track approach with IETF change control would have a sufficiently high bar, so I suspect that it would be fine as well.
 

> If you and the broader community prefer the standards track approach, now would be a good time to let the sponsoring AD know.

To be clear, I think that this document is almost certainly a reasonable record of how the robots.txt format works today, and that its authors are acting in good faith.

That's good to hear, and I certainly agree. I continue to shepherd the document, despite my change in employer, because I am personally convinced that getting this published as an RFC is a good thing.

However, given the circumstances I'm concerned that from the 'outside', publishing the document in this manner won't look legitimate -- and therefore call into question the legitimacy of the IETF itself, in some eyes.

I think there are a few different ways we could address that while meeting the authors' goals.

0) I still believe that removing the no-derivatives clause is the most straightforward to do so. TCP, QUIC, HTTP, and many other Internet specifications remain stable and backwards compatible without the benefit of a no-derivatives clause; I don't see how robots.txt is different.

TCP, QUIC, HTTP and many others are developed within the context of the IETF.  This has not historically been developed here, and so I believe that these are not the strong parallels you see. 

1) Alternatively, statements from at least some other search engines that they are aware of this work and do not object to it being published would change how this action is perceived considerably. Ideally, this would be represented by adding authors from other search engines to the document.

I have asked the authors to make sure their colleagues who have reviewed the work chime in here.  I think adding authors is not necessary and it would go against what the RFC Series editor has previously stated for the role of an author:


2) Or, since the document has now been reviewed for ABNF, nothing stops it from being switched to the Independent Stream with a title like "The Google Robots Exclusion Protocol" (to reflect its 'proprietary' nature).


As I noted before, I think you are keying off the word proprietary a bit too much, and it absolutely is not Google's Robots Exclusion Protocol; Martijn's work on this was released two years before Google existed, as I am sure you know.

regards,

Ted Hardie


 
Cheers,


--
Mark Nottingham   https://www.mnot.net/

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux