Re: [PATCH 2/2] gitweb: remove invalid http-equiv="content-type"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, March 7, 2022 7:23:49 AM EST Ævar Arnfjörð Bjarmason wrote:
> I'm not sure I understand this change really. The result in always XML,
> so application/xhtml+xml is redundant, text/html, or both?

To be honest, using an http-equiv="content-type" in XHTML is confusing. When 
you do use one, your goal shouldn’t really be to specify the document’s MIME 
type. After all, the first three lines of each page say

	<?xml version="1.0" encoding="utf-8"?>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
	<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en-US" lang="en-US">

Those lines are more than enough to determine that something is using XHTML 
and UTF-8. Instead, the idea is to help out a parser that is incorrectly 
parsing the document as HTML (instead of as XHTML). Historical W3C documents  
(that were applicable when http-equiv="content-type" was allowed in XHTML) [1]
[2][3] indicate that http-equiv="content-type" should be used like this:

	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

In other words, to use http-equiv="content-type" properly in XHTML, you had to 
lie about the document’s type. The fact that this is confusing is probably 
part of why WHATWG disallowed it in the HTML Standard.

> But aside from that: I have seen browsers get the lack of encoding=""
> "wrong" with data at rest, don't some still default to ISO-8859-1?
> 
> So won't this result in badly decoded data if you save the web page &
> view it locally?

I tested this idea in ungoogled-chromium, Firefox and Pale Moon. Other than 
Pale Moon in one specific circumstance, they all used UTF-8 as the encoding. 
Pale Moon used windows-1252, but only when the file ended with .html. When the 
file ended with .xhtml, Pale Moon used UTF-8. That being said, we don’t have to 
use an http-equiv="content-type" to fix the problem. Instead, we can use a 
<meta charset="utf-8"> which is allowed by the HTML Standard [4].

[1]: <https://www.w3.org/TR/xhtml1/#C_9>
[2]: <https://www.w3.org/TR/html-polyglot/#character-encoding>
[3]: <https://www.w3.org/Bugs/Public/show_bug.cgi?id=21818>

[4]: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-charset>







[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux