reply to comments on draft-hardy-pdf-mime

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is my (personal) response to comments on draft-hardy-pdf-mime -02 and -03. I haven't had a chance to review all of these with all of the co-authors.

There is a new -04 draft now.

I'll send individual responses as requested with references to this discussion; I’m trying to reply to reviews by:
S Moonesamy, Phillip Hallam-Baker, John Klensin, Kathleen Moriarty, Stephen Farrell, Dan Romascanu, Mirja Kuehlewind, Ben Campbell

========
History:

I volunteered to help with draft-hansen-rfc-use-of-pdf;  the history
 history of PDF wasn't right, and the registration in 3778 did need
to reflect the ‘change controller’. This draft was started in the same
github repo, until

https://github.com/masinter/pdfrfc/commit/a7208154d541613ba66f59aab6e1507754fe26e4

I talked the ISO committee working on PDF 2 to take on owning the
"fragment identifier" semantics; it wasn't part of the PDF 1.7
definition that got fast tracked, but it belonged with the PDF spec.
(In general, those defining file formats and registering them need to
be prodded into owning the "fragment identifier semantics".)

I don't know of any other instance of an ISO committee defining a
media type. The ISO committee wants to put the RFC number into the
PDF 2 spec, while this spec wants to make normative reference to the
(likely-to-be-approved-in-2017) PDF-2 spec.

Note that application/pdf was FIRST registered in 1993 (23 years ago)
by Paul Lindner for use in the gopher protocol. I was one of the
GopherCon '93 attendees to urge him to do so (and TimBL to use
content-type in HTTP/1.0).

"application/pdf" was chosen before the introduction of the
vnd. prefix.  When the "standards tree" and "vendor tree" distinction
was introduced, application/pdf was grandfathered rather than forced
to application/vnd.adobe.pdf.  It's only relatively recently that it
now qualifies for "standards tree".

Just to update the media type registry might not even be necessary.
(The text/html registration was updated without an RFC to obsolete
RFC 2854.) 

Does any of this history belong in the document? I didn't think so.
======
Editorial:

The paragraph numbers are a feature of xml2rfc. I turned them off.

Yes, the Introduction should say "It obsoletes [RFC3778]."  and
RFC3778 added to Informative references.

======
Interoperability considerations:

I put in a reference to ISO 32000-1 Annex I "PDF Versions and
Compatibility" talks about the use of version numbers and backward
compatibility.

http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=735

There's a lengthier blog post/paper by Jim King:
http://blogs.adobe.com/insidepdf/2009/08/pdf_evolution_and_compatibilit.html
http://blogs.adobe.com/insidepdf/Compatibility_090819.pdf

... but it's older, Jim has retired and unlikely to update it, I don't
know if the URL is stable (it's a blog, not an official document).
Is it worth referencing? I thought not.

======
Security considerations:

There were a lot of comments about Security Considerations.

It's true that lots of RFC 3778's security considerations got
replaced, here's the commit:

https://github.com/mrbhardy/pdfmime/commit/76904f445bd35a472f759bc45a25d24f695d40f0

As I understood it, the feeling was that the previous text was wrong
or misleading, that these days many formats allow scripting, and the
techniques and reasons for sandboxing well known.

 Version -04 adds to the end of “Security Considerations”:

   PDF interpreters executing any scripts or programs related to these
   constructs must be extremely careful to insure that untrusted
   software is executed in a protected environment.

PDF has been around a long time; a search for "PDF malware" turns up
lots of hits. I too wish there were an ISO or other document that
could be cited, but I haven't found one.  Is it necessary to say more,
in the MIME type registration? 

It might help to clarify who "Security Considerations" in MIME
registrations are mainly for. Don't think of developers of PDF
viewers/interpreters, think of system administrators, makers of
firewalls, proxies. People who don't really care about PDF, who won't
study the spec, just want to know what they should watch for, beyond
the usual for every file type. Having a target audience might help.

========
(PH-B) MIME types should identify content which has scripts/macros:

I wasn't sure if you meant the type itself, or that content should be
labeled.  I'm not sure how this applies to the application/pdf registration. 

In general, a label "I have no scripts" isn't helpul because bad guys
can lie [RFC 3514]. You can refuse to run scripts, or run them with
limited access.

=====
Signatures add trust:

> PDF also has a signature capability which is relevant. If the Macros are
> signed by a trustworthy party, they are less of a concern than random
> Macros.

Is this true? malware that reproduces itself can't be signed and
transmitted.

=======
Subsets without scripting?

> ...  some of the subsets do not allow
> embedded scripting.  If that is correct, it should certainly be
> mentioned.

I mentioned it for PDF/A in passing, although it is not clear how this helps.
A file could lie and say it was PDF/A but still have scripts.

================
Adobe PDF vs ISO specs

> how the current ISO version of PDF compares to the Adobe version

ISO 32000-1 was adopted using ISO Fast-Track process and is
technically identical to Adobe Portable Document Format version
1.7. The Fast-Track process doesn't allow any technical changes; it
was just rewritten in ISO-spec style.


> If the difference is significant, then a new
> media type, not reuse of an old one, is required.  Even if it is
> not that significant, it appears to me (as a co-author of RFC
> 6838) that there is a strong case to be made for parameters that
> identify versions and/or specific subsets to help applications
> to identify viewers or processors that will not fail.  The
> authors may have good reasons to not include either parameter,
> but it seems to me that the I-D should then explain why not.

There are no technical differences between Adobe PDF 1.7 and ISO
32000-1:2008.

I don't see a case for version or subset parameters in content-type
headers here or most anywhere that backward and forward compatibility
has been carefully planned.

As with most living file formats, if you want to consume files
using the latest features in their fullest, you want to have an
up-to-date viewer; publishers can target earlier versions for wider
applicability and reliability.

I think that’s it.  Thanks all for your reviews,

Larry
--
http://larry.masinter.net






[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]