[Bug 215737] New: uri.7: DESCRIPTION: Character encoding: Reference to obsolete IETF RFCs 2718 and 2279

bugzilla-daemon@xxxxxxxxxx · Thu, 24 Mar 2022 12:01:37 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=215737

            Bug ID: 215737
           Summary: uri.7: DESCRIPTION: Character encoding: Reference to
                    obsolete IETF RFCs 2718 and 2279
           Product: Documentation
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: man-pages
          Assignee: documentation_man-pages@xxxxxxxxxxxxxxxxxxxx
          Reporter: alx.manpages@xxxxxxxxx
        Regression: No

[uri(7)::DESCRIPTION::Character encoding] reads as:

```
   Character encoding
       URIs use a limited number of characters so that they  can
       be typed in and used in a variety of situations.

       The  following characters are reserved, that is, they may
       appear in a URI but their use is  limited  to  their  re-
       served  purpose  (conflicting data must be escaped before
       forming the URI):

                  ; / ? : @ & = + $ ,

       Unreserved characters may be included in  a  URI.   Unre-
       served  characters  include uppercase and lowercase Latin
       letters, decimal digits, and the following limited set of
       punctuation marks and symbols:

                  - _ . ! ~ * ' ( )

       All other characters must be escaped.  An  escaped  octet
       is encoded as a character triplet, consisting of the per-
       cent character "%" followed by the two hexadecimal digits
       representing  the  octet  code  (you can use uppercase or
       lowercase letters for the hexadecimal digits).  For exam-
       ple, a blank space must be escaped as "%20", a tab  char-
       acter  as  "%09", and the "&" as "%26".  Because the per-
       cent "%" character always has the reserved purpose of be-
       ing the escape indicator, it must be  escaped  as  "%25".
       It  is  common practice to escape space characters as the
       plus symbol (+) in query text; this practice  isn't  uni-
       formly  defined in the relevant RFCs (which recommend %20
       instead) but any tool  accepting  URIs  with  query  text
       should  be  prepared  for them.  A URI is always shown in
       its "escaped" form.

       Unreserved characters can be escaped without changing the
       semantics of the URI, but this should not be done  unless
       the  URI  is  being used in a context that does not allow
       the unescaped character to appear.  For example, "%7e" is
       sometimes used instead of "~" in an HTTP  URL  path,  but
       the two are equivalent for an HTTP URL.

       For  URIs  which  must  handle  characters outside the US
       ASCII character set, the HTML 4.01 specification (section
       B.2) and IETF RFC 2718 (section 2.2.5) recommend the fol-
       lowing approach:

       1.  translate the character sequences  into  UTF-8  (IETF
           RFC 2279)--see utf-8(7)--and then

       2.  use  the URI escaping mechanism, that is, use the %HH
           encoding for unsafe octets.
```

It refers to obsolete RFCs[1][2].  We should update the info there.

[1]: <https://www.rfc-editor.org/rfc/rfc2718>
[2]: <https://www.rfc-editor.org/rfc/rfc2279>

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.