https://bugzilla.kernel.org/show_bug.cgi?id=215737 Bug ID: 215737 Summary: uri.7: DESCRIPTION: Character encoding: Reference to obsolete IETF RFCs 2718 and 2279 Product: Documentation Version: unspecified Hardware: All OS: Linux Status: NEW Severity: normal Priority: P1 Component: man-pages Assignee: documentation_man-pages@xxxxxxxxxxxxxxxxxxxx Reporter: alx.manpages@xxxxxxxxx Regression: No [uri(7)::DESCRIPTION::Character encoding] reads as: ``` Character encoding URIs use a limited number of characters so that they can be typed in and used in a variety of situations. The following characters are reserved, that is, they may appear in a URI but their use is limited to their re- served purpose (conflicting data must be escaped before forming the URI): ; / ? : @ & = + $ , Unreserved characters may be included in a URI. Unre- served characters include uppercase and lowercase Latin letters, decimal digits, and the following limited set of punctuation marks and symbols: - _ . ! ~ * ' ( ) All other characters must be escaped. An escaped octet is encoded as a character triplet, consisting of the per- cent character "%" followed by the two hexadecimal digits representing the octet code (you can use uppercase or lowercase letters for the hexadecimal digits). For exam- ple, a blank space must be escaped as "%20", a tab char- acter as "%09", and the "&" as "%26". Because the per- cent "%" character always has the reserved purpose of be- ing the escape indicator, it must be escaped as "%25". It is common practice to escape space characters as the plus symbol (+) in query text; this practice isn't uni- formly defined in the relevant RFCs (which recommend %20 instead) but any tool accepting URIs with query text should be prepared for them. A URI is always shown in its "escaped" form. Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear. For example, "%7e" is sometimes used instead of "~" in an HTTP URL path, but the two are equivalent for an HTTP URL. For URIs which must handle characters outside the US ASCII character set, the HTML 4.01 specification (section B.2) and IETF RFC 2718 (section 2.2.5) recommend the fol- lowing approach: 1. translate the character sequences into UTF-8 (IETF RFC 2279)--see utf-8(7)--and then 2. use the URI escaping mechanism, that is, use the %HH encoding for unsafe octets. ``` It refers to obsolete RFCs[1][2]. We should update the info there. [1]: <https://www.rfc-editor.org/rfc/rfc2718> [2]: <https://www.rfc-editor.org/rfc/rfc2279> -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.