Clarification regarding URI (RFC3986) spec followed by HTTP (RFC9110)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To whomever it may concern,

I am writing to seek clarification regarding the URI spec (RFC3986) followed by HTTP, specifically about percent-encoding arbitrary octets (which do not comprise a valid UTF08 sequence). In the last paragraph of RFC3986 Section 2.5 (https://www.rfc-editor.org/rfc/rfc3986.html#section-2.5), it says, quote:

>  When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.

This implies that URI schemes defined after RFC3986 must follow UTF-8 encoding in their URIs. However, the original HTTP/1.1 RFC (2616) was dated June 1999, and so would not have had to "abide" by the UTF-8 rule.

In fact, many web servers allow and process GET requests with percent-encoded octets, which they decode as raw bytes and have the application level logic handle how to process them.

However, since HTTP's latest RFC is 9110, dated June 2022 (post RFC3986), does it mean the UTF-8 rule now applies to it? I would think not, since this would be a breaking change. But some comments on github indicate that this is as per the spec ()

tl;dr - Is it compliant with the HTTP specification to send arbitrary bytes, which do not represent a valid UTF-8 sequence, via percent-encoding in the URL query parameter?


Regards,

Raghu Saxena


Attachment: OpenPGP_0xA1E21ED06A67D28A.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux