Re: Gen-ART LC/Telechat review of draft-freed-sieve-in-xml-05

ned+ietf@xxxxxxxxxxxxxxxxx · Tue, 18 Aug 2009 17:54:57 -0700 (PDT)

On Aug 16, 2009, at 11:01 AM, Ned Freed wrote:

[...]

>> it would be helpful to have a sentence or two somewhere (maybe
>> in the intro) to explicitly say so. My confusion might be around the
>> meaning of the term "client" in this context.
>
> No, I think your confusion is that you read a lot more into the text
> than it
> actually says. There's a pretty big difference between "no semantic
> understanding whatsoever" and "an incomplete semantic understanding'.

I think the confusion is that the text says very little one way or the
other. You have assumptions in mind about the semantic knowledge of an
editor that are not explicitly stated.

On the contrary, we have made _no_ assumptions whatsoever about it. And the
draft reflects that. You, OTOH, appear to have approached this with a set of
assumptions I for one frankly don't comprehend in your head. Perhaps - and this
is just speculation on my part - this is because, as you have stated, you
haven't done much work using XML tools. If so, then you need to understand that
this document assumes considerable familiarity with XML and the tools used to
manipulate it. And given the topic of the document this is a perfectly
reasonable assumption to make IMO.

A reader that was not privy to
the process of creating this draft  may come with a different set of
assumptions, and may not draw the inferences you expect them to.

In my case, it seemed counter-intuitive that an implementer would be
willing to implement sieve semantics but unwilling to deal with the
syntax.

And this is a case in point. The purpose of this specification is to provide a
means of representing Sieve using an alternate syntax without changing any of
the language semantics. As such, the audience is *exactly* the group of people
who are "willing to implement sieve semantics but unwilling to deal with the
syntax". (And from all indications - there are now alternative XML
representatiions for many other applications formats - this is a pretty large
group.)

I have to say that approaching such a specification with the idea that it's
entire goal is counterintuitive is a pretty good recipe for confusion on your
part. And I don't think any amount of clarifying prose can possibly assist you
in dealing with such a fundamental expectation mismatch.

Your "template" comment below illustrates a case where that
makes more sense.

Again, the extent to which an editor understands and can deal with Sieve
semantics is largely orthogonal to the representation format. There are extant
Sieve editors that don't use the XML representation and which understand
essentialy no Sieve semantics at all - they are controlled by embedded comments
in special formats only, and treat the Sieve material between the comments as
opaque. Just think how easy it would be for some other Sieve generation
facility to confuse such an editor.

>
>> Is the expectation that
>> an "editor" must be semantically aware of sieve, but a processor does
>> not (beyond the list of "controls")?
>
> The expectation is that the amount of semantic understanding an
> editor is going
> to need will very much depend on the range of operations the editor
> is able to
> perform. Simple template-based systems will only manipulate labelled
> blocks of
> Sieve code without any understanding of what that code does. A more
> sophisticated editor might need to have a detailed knowledge of how
> blocks in
> Sieve work, or how to build conditional expressions, or even the
> details
> sematics of various tests and actions.

That paragraph clarifies a lot. I think it would be helpful to include
it in the draft.

I disagree. The above paragraph might make sense to have in some sort of Sieve
usage document. It's unnecessary and distracting here.

>
>> ...
>
>> Instead of round trip "conversion", I should have said round-trip
>> "editing". My concern is, if I create a script using Editor A, then
>> later edit it with Editor B, any metadata created by Editor A is
>> likely to be lost.
>
> And that's a valid concern to have. Again, there are going to be
> cases where
> one editor has no choice but to strip the information added by
> another. This is
> simply how things are; there's nothing this or any other
> representation scheme
> can do to eliminatte this possibility.
>
>> Is that the intent?
>
> It's not a matter of intent. It is simply an unavoidable reality.
>
>> If so, it's probably worth
>> mentioning that an editor needs to be able to deal rationally with
>> the
>> loss of its own metadata.
>
> First, while it is certainly desireable for all editors to have this
> characteristic, there are going to be cases where it cannot possibly
> work this way. So this can't be a requirement.

So am I understanding correctly that it's unreasonable to expect an
editor to just leave metadata alone if it doesn't understand it,

it depends on the context. Hopefully the XML format will help make it a little
easier to do this in some cases. But certainly not all.

and
it's also unreasonable to expect an editor to behave in a sane manner
if its metadata gets stripped?

Again, it depends on the context.

It seems like there are three choices here: You can expect editors to
preserve metadata from other editors, you can allow stripping of
metadata and expect editors to deal rationally with its loss, or you
can expect that if a user uses more than one editor over the lifetime
of a script, one or both of the editors is likely to fail in a non-
graceful way.

Did the working group really choose the third option?

It isn't a question of what was chosen. The WG came up with one of the simplest
language syntaxes imagineable - the ABNF for Sieve is *tiny* - but any language
with sufficient flexibility to represent any sort of useful subset of the
scripts people want to write to process email is going to be one that's too
complex for many editors to want to understand fully. And since editors aren't
always going to have full semantic understanding, they cannot be expected in
all cases to be able to manipulate the full set of possible sieves producing by
other systems without screwing up.

Of course the WG could have imposed some requirements on this, saying in effect
"you must fully inderstand Sieve in order to be a compliant editor". But such a
requirement would either have been roundly ignored, or implementors would
choose some other language that doesn't have such requirements. And again, this
document is absoutely not the place for stating such requirements, even if they
made sense to have, which IMO they do not.

Put another way, the language you appear to be seeking here is one that is
trivially shown to be overconstrained by engineering realities into
nonexistance.

>
> Second, even if it were appropriate to make this a requirement, this
> document
> isn't the place for it. All this document does is describe an XML
> representation for Sieve. All of the requirements it imposes are
> directed at
> the representation and the process of converting to or from that
> representation.
>
> But since there is no requirement that a Sieve editor use this XML
> representation at all - and in practice most extant Sieve editors
> operate
> directly on the native Sieve format - imposing requirements on
> editors here
> makes little if any sense.

I fail to understand why it is acceptable to put requirements on
processors but not on editors. Certainly no one would expect an editor
that does not implement this specification to be bound by any
requirements in it.

And that's precisely the problem. Most editors operate directly on the regular
Sieve representation, not the XML representation. If you want to impose a
requirement on Sieve editors, this is not the place to do it because you're
only hitting a fraction of the audience.

For that matter, you already have (admittedly
weak)  2119 language referring to editors

Actually, there is exactly one constraint the document imposes on editors (the
other compliance language explains a couple thinkgs editors are explicitly
allowed to do), which has to do with the contents of displayblock and
displaydata not being allowed to include comment close sequences. This is done
to simply conversion processing and, unlike the requirements you want to
impose, applies only to Sieve editors operating on the XML representation. So
it is appropriate for this document to state such a requirement.

(That said, properly speaking this should be a Schema and RNG constraint, but
it turns out to be very difficult to do in those languages, so we cheated and
did it as a prose constraint. In other words, this is a kluge to get around a
limitation in the specification language, just like text descriptions attached
to ABNF do similar stuff on a regular basis in many other specifications.)

But if you are unwilling to place normative requirements around this,

It isn't a question of what I'm willing or unwilling to do, but rather what I,
as an individual author working on WG document, is able or unable to do. The
stuff you appear to be affter clearly doesn't belong in this document or AFAICT
in any  other document the WG plans to produce. If you want to see various
general requirements on Sieve editors written down somewhere you're going to
have to convince the WG that such an effort is worth it.

it would still help quite a bit to have some non-normative guidance to
the effect that, since there is no requirement for an editor to
preserve metadata from another editor, an editor implementation can
expect to have its metadata removed from any given script. It it does
not handle this gracefully, bad user experiences are likely to result.

Again, while such discussion might arguably be useful, this is not the place
for it and I'm not the one you need to convince to do it.

>> >> Why not MUST? Wouldn't violation of this requirement introduce
>> >> interoperability problems between different implementations?
>> >
>> > It's a SHOULD because the WG believed that there may be some
>> > exception cases
>> > where an alternate format makes more sense.
>
>> Can you offer (in the text) some examples of those exceptional cases,
>> and the consequences thereof?
>
> I see no need to.
>
>> My concern is that it seems like violating the should would pretty
>> much break interoperability between processors, wouldn't it?
>
> Sure, which is why it's a SHOULD, not a MAY. Again, this is the
> compliance
> level the WG decided was appropriate. Even if I agreed with you,
> this is not a
> simple editorial nit that I can change on my own.

It has been my experience that SHOULD level requirements that both
significantly impact interoperability and offer no explicit guidance
about the consequences of violation are some of the biggest sources of
interoperability problems in existing specs.

I'm starting to think that the WG had very limited expectations of
interoperability between implementations that use this format.

Realistic expectations would be closer to the mark. But again, you persist in
confusing issues inherent in automatic generation and modification of Sieve
code with this specific representation format. To the extent this specification
attempts to address this, it is by relieving implementorz of the burden of
having yet another parser and supporting yet another syntax, and by selecting a
syntax which has a vast array of very powerful manipulative tools available. We
hope that this will help make some of the problems inherent in this space a
little easier to overcome.

I
recall a sentence stating that you expected interoperability between
editors and processors. I think an average reader would expect
interoperability among multiple editor implementations and among
multiple processor implementations. If the work group did not intend
that degree of interop, it would be extremely helpful to have some
sort of applicability statement to that effect.

Again you're asking for all sorts of stuff that far, far, far exceeds the
purview of this specification.

>
>> Or at
>> least cause encoded metadata to get lost if you convert from XML to
>> sieve using one processor, and back to xml with another?
>
> That's the obvious case where such a loss would occur.
>
>> >
>> >> -- Security Considerations, last paragraph:
>> >
>> >> You mention that potentially executable content can be
>> introduced via
>> >> other namespaces, and that "appropriate security precautions"
>> should
>> >> be taken. I think this needs more discussion, as I am not sure an
>> >> implementor will understand what the authors considered
>> appropriate.
>> >
>> > The point of Sieve namespaces is to allow multiple XML vocabularies
>> > to be used
>> > in a single document. This is a completely open ended mechanism and
>> > it is not
>> > our intent to label any particular use as inappropriate. As such,
>> > unless you
>> > have some specific text in mind, I for one fail to see what could
>> be
>> > added here
>> > that would be useful.
>
>> Maybe an examples of the sorts of bad behavior that could be enabled
>> by this would help.
>
> I think introducing another XML vocabulary into this document simply
> for
> purposes of showing that you can put bad stuff in XML would be
> belaboring the
> obvious.
>
>> Are you concerned that a scriptable editor that
>> stores scripts in metadata could be attacked by hand coding scripts
>> into structured comments in native Sieve?
>
> For that to happen there would have to be a pretty serious bug in the
> conversion process, so no, this is not the concern here at all.
>
>> Buffer overflow attacks on
>> conversion processors?
>
> This would be another sort of conversion process bug and not
> relevant to the
> concern at hand.
>
> All this text is doing is point out the rather obvious fact that XML
> namespaces allow you to mix vocabularies in a single document. As
> such, it
> is possible to drag in some other vocabulary that has its own set of
> security
> problems.
>
> If this still isn't clear to you I'm sorry, but I'm at a loss as to
> how
> to explain it further.

I think it's clear to me after reading your explanation. Am I correct
in understanding that the point of that sentence was that any given
namespace mayl have its own set of security considerations, and that
is beyond the scope of this document? If that is a correct
understanding, then I suggest replacing the last sentence with
something to the effect of:

"Such facilities will come with their own sets of security
considerations, which are beyond the scope of this document."

I really don't think this is that much  clearer, but I can live with changing
it to read:

 Such material will necessarily have its own security
 considerations, which are beyond the scope of this document.

Also, you elided one of the questions from my previous email without
responding:

>
>
>> -- Section 4.1, paragraph 11:  "Implementations MAY use this to
>> represent complex data
>>   about that sieve such as a natural language representation of sieve
>>   or a way to provide the sieve script directly."
>
>> I'm not sure I understand the last part  --are you saying this can be
>> used as an alternate encoding of the script?
>
> Of course not. Since when do we have programs capalable of taking
> completely
> arbitrary natural language statements and reliably encoding them into
> programming language statements?
>
> I see nothing unclear about this at all.

I get the part about representing a "natural language representation",
but what did you intend by "... or a way to provide the script
directly"?

My intent was to say exactly what was said - a UI could present Sieve
statements directly to the user. Really, I cannot see anything unclear about
this at all and I am completely at a loss to explain it furhter.

				Ned
_______________________________________________

Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf