Re: Last Call: 'The APPLICATION/MBOX Media-Type' to Proposed Standard

John C Klensin <john-ietf@xxxxxxx> · Tue, 17 Aug 2004 16:06:03 -0400

--On Tuesday, 17 August, 2004 15:09 -0400 "Eric A. Hall"
<ehall@xxxxxxxxx> wrote:

>> To be clear about this, I think there are three choices which
>> we might prefer in descending order:
>> 
>> 	(1) There is a single canonical "wire" format in which
>> 	these things are transmitted.
> 
> Such a specification would surely dictate "a series of
> message/rfc822 objects". But if we were to require that
> end-points perform conversion into a neutral form, we might as
> well go the whole nickel and just say "use multipart/digest",
> because that's where we'd end up after monhts of beating on
> each other.
>...
>> 	(2) The content-type specifies a conceptual form
>> 	("application/mbox") but has _required_ parameters that
>> 	specify the specific form being transmitted.
> 
> Global parameters are useless if the parser is intelligent
> enough to figure out the message structure independently.
> Given that such intelligence is a prerequisite to having a
> half-baked parser, the global parameters are always
> unnecessary.

This is a minor point compared to the one below, and probably
not an issue here, but I can't let the above stand.  My
impression of the MIME design, from the beginning, was that I
should be able to inspect the content-type --including both the
primary type and any parameters-- before deciding whether to
retrieve and open the content of the body part.  Yes, we have
implementations and protocols that don't take advantage of that,
or that can take advantage of the content-type only and not the
parameters, but the MIME design, and hence the media type
design, is that I should be able to tell whether I can parse
(see below) the body part without opening it and applying
heuristics to it.

In addition, what you seem to mean by "intelligent enough to
figure out the message structure independently" is what I would
mean if I said "know most or all of the formats and apply
heuristics to figure out which one to apply".  We just don't do
that, at least knowingly and with standards-track media types.
You might rationally argue that this should be an exception if
you could demonstrate that a reasonable set of heuristics would
_always_ make the choice correctly, but that would require you
to document either the heuristics or all of the variations on
mbox and how to tell them apart (or both) -- something that
several people have claimed is basically impossible and that you
clearly haven't expressed a desire to do.

In that context, unless I completely misunderstand what is going
on here, the "...prerequisite to having a half-baked parser..."
assertion borders on the silly.   Take the example to which Tony
has been pointing.  Apparently the Solaris version of an mbox
format is well-documented and based on content length
information rather than key strings.   That implies that, if (i)
I know that what is coming is in that Solaris format and (ii) I
have a rather primitive parser that knows how to find and deal
with content lengths, then I can parse the format without any
ability to "figure out the structure" at all.   However, if I
attempt to apply that parser to something to uses key strings,
rather than lengths, or something that violates the assumptions
Solaris makes about what gets included in the length
computations, the parser is going to be bewildered at best and
yield silly results at worst -- and that is exactly what content
types and their parameters are intended to let receiving
applications guard against.

> Actually, global parameters are more than useless. What if we
> have a mixed mbox file, where some messages are untagged BIG5
> and others are untagged 8859-1, or we have some messages have
> VMS::Mail addresses and others have MS/Mail addresses, or so
> forth? The global nature of global parameters ignores the
> per-message reality of the mbox structure.
> 
> Global parameters can also be harmful if they conflict with
> reality.

That brings us to the main problem, or misunderstanding, or
strawman, depending on one's perspective.

I may be being excessively dense here, but, if I am, I seem to
have significant company.   I am not asking (or suggesting) that
you provide the information that would be required for
multipart/, e.g., the content-type and associated parameters
(such as charset, for your examples above) for each message
(much less, for multipart/digest, that you force everything into
the required subset of an RFC822 message body).  While I have
concerns that we didn't get multipart quite right and that it is
too late to fix it, those concerns don't interact with my
concern about application/mbox at all.

So let's move back a half-step.  To me, the essence of the mbox
format, conceptually, is that it consists of a sequence of blobs
that are normally interpreted as messages in some format or
other. You've made several convincing arguments that we should
see them as blobs, not as messages, and I (and I think others)
accepted them long ago.   I think that such a blob collection is
a reasonable thing to want to mail or otherwise use as a media
type.  And, again, I accept your argument that the blobs may not
be valid 822/2822 messages or encapsulated 2821 messages, and,
indeed, that there may be considerable format and content
heterogeneity from one blob to the next. 

Given that model, the key to an mbox format isn't the content of
the blobs, it is the system used to decompose an mbox into a
blob collection.  If this were a multipart structure, the
corresponding issue of interest would be whether the Boundary
parameter provided enough information to separate the parts.  It
would not be what was in each of the parts or even what _their_
content types were.  It would simply be the ability to separate
the aggregate mbox into separate blobs.

If I know that blobs are length-delimited according to some
specific set of rules, I have that information and can build a
trivial parser (and understand what it can't handle).   If I
know they are separated by indicator strings that obey some
specific set of rules, then I can build a trivial parser (and
understand what it can't handle).  And so on for different sets
of rules.   But, as far as I can tell, you don't want to give us
that information.  Instead, you want us to accept "well, it is
some sort of mbox format, and you either need to guess at how to
separate the blobs by examining the content or you need to get
that information out of band".   If that is what you intend this
specification to imply, IMnvHO, unacceptable for a
standards-track media type.

Finally, if you argue that global parameters are unacceptable as
a means of making the type of format/ de-blob-ing distinctions
outlined below, I suggest that you are not making a good case
for leaving the parameters (and associated information) out.
Instead, you are making a case that this registration should be
a family of, e.g.,
  application/mbox-solaris-v5-and-later
  application/mbox-sendmail-v2-v4
and similar things (examples made up).

       john 

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf