Re: [PATCH V10 09/22] LoongArch: Add boot and setup routines

WANG Xuerui <kernel@xxxxxxxxxx> · Mon, 16 May 2022 10:41:41 +0800

Hi,

On 5/15/22 20:38, Huacai Chen wrote:

diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S
new file mode 100644
index 000000000000..f0b3e76bb762
--- /dev/null
+++ b/arch/loongarch/kernel/head.S
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020-2022 Loongson Technology Corporation Limited
+ */
+#include <linux/init.h>
+#include <linux/threads.h>
+
+#include <asm/addrspace.h>
+#include <asm/asm.h>
+#include <asm/asmmacro.h>
+#include <asm/regdef.h>
+#include <asm/loongarch.h>
+#include <asm/stackframe.h>
+#include <generated/compile.h>
+#include <generated/utsrelease.h>
+
+#ifdef CONFIG_EFI_STUB
+
+#include "efi-header.S"
+
+     __HEAD
+
+_head:
+     .word   MZ_MAGIC                /* "MZ", MS-DOS header */
+     .org    0x28
+     .ascii  "Loongson\0"            /* Magic number for BootLoader */
If you must use a magic number, "Loongson" is not recommended, because
this string lacks uniqueness in the Loongson/LoongArch world. Too many
things are called "Loongson foo" right now, and the string is so
ordinary people don't immediately think of it as "magic".

I recommended using some other interesting text (and encoding) for the
magic number, in a different communication venue, but I think that
proposal got ignored by you without any explanation whatsoever. For now
I'll just repeat myself:

For an interesting magic number related to Loongson/LoongArch/Loong
(like dragons but not exactly the same, let's not expand on that front)
in general, it's perhaps better to use GB18030-encoded four-character
dragon-related idioms. It's GB18030 because one Chinese character is 2
bytes in this encoding, and being non-UTF-8 it's unlikely any user input
would accidentally resemble it. So we get 8 bytes that appear as huge
negative numbers if cast into C long, and random enough that collisions
are highly unlikely.

For example, I chose 4 famous dragon-related phrases from the I Ching,
in both simplified and traditional characters:

潜龙勿用: 0xc7b1c1facef0d3c3
见龙在田: 0xbcfbc1fad4daccef
飞龙在天: 0xb7c9c1fad4daccec
亢龙有悔: 0xbfbac1fad3d0bbda
潛龍勿用: 0x9d93fd88cef0d3c3
見龍在田: 0xd28afd88d4daccef
飛龍在天: 0xef77fd88d4daccec
亢龍有悔: 0xbfbafd88d3d0bbda

and I think each of them is better than "Loongson".
ARM64_IMAGE_MAGIC is "ARM64", RISCV_IMAGE_MAGIC is "RISCV", so I think
we use "Loongson" as a magic is just OK.

Actually you made a good point here, that I failed to check for myself 
earlier.

Looking at the arm64 and riscv image header code more closely, it seems 
loongarch is trying to follow the now deprecated riscv-specific practice 
of using 8-byte magic (deprecated as of commit 474efecb65dce ("riscv: 
modify the Image header to improve compatibility with the ARM64 
header")). In doing this they also changed the offset of the magic: on 
riscv it's at 0x30, while here it's at 0x28 (riscv's "res2" field). This 
is just the exact kind of "proliferation of image header formats" that 
we would want to avoid.

Now for some additional but important bikeshedding...

The current arm64 and riscv magic numbers are all 4-byte long, at offset 
0x38, and they are cute little strings identifying their origin: 
"ARM\x64" and "RSC\x05" respectively. Thus, for loongarch, we probably 
want to do the same -- 4-byte nice little strings with a hint of 
LoongArch/Loong. Considering UTF-8 uses 3 bytes for most Chinese 
characters, and 4 bytes for characters outside of BMP, we could use a 
little bit of creativity here:

- "LA64", the "dullest" version with only ASCII characters, but I don't 
know if future LA32 systems will want to use the same image header format;
- "\xe9\xbe\x99\x64" ("龙\x64") or "\xe9\xbe\x8d\x64" ("龍\x64") -- 龙/龍 
means "loong/dragon", hence a variant of the above;
- "\xf0\x9f\x90\xb2" ("🐲") or "\xf0\x9f\x90\x89" ("🐉") -- the 
loong/dragon emoji, taking full advantage of the 4 bytes available while 
not mentioning bitness.

A case might be made for pure-ASCII magic numbers, that they're easier 
for naked-eye inspection, but (1) this is already not the case for the 
new riscv magic, and (2) given all other interesting fields are in 
binary it's already necessary to use hex editors for any task more 
complex than mere identification.

So, I think the bottom line is: don't use the 8-byte magic at offset 
0x28, switch to 4-byte magic at offset 0x38 to keep consistent with 
everyone else. I don't actually have a preference, but personally I'd 
prefer some freshness in the low-level land, if that doesn't hamper 
people's flows. ;-)