Re: [boot-time]

Rob Landley <rob@xxxxxxxxxxx> · Sun, 12 Jan 2025 12:35:55 -0600

On 1/12/25 04:11, Marko Hoyer wrote:
Am 12.01.25 um 02:03 schrieb Rob Landley:
On 1/11/25 12:57, Bird, Tim wrote:
Hey Rob, This is a great review of /dev, /sys and the different
ways that /dev gets populated.

Feel free to link stuff from wikis or some such. The newest of those 
documents was written in 2007.

For a lot of embedded Linux devices, the only bus where
new items can show up dynamically is USB.

SDCARD readers connected via MMC are common in automtove head units as 
well ...

But do they give an insertion/removal notification that can generate an 
interrupt rather than needing to be polled? (Last couple of boards I 
poked at didn't, but it was cheap hardware...)

When a driver DOESN'T automatically bind to them it gets a bit 
complicated, and one of the things mdev can be configured to do is act 
as a firmware loader! Which is just... Ahem, there are YEARS of poor 
design decisions the kernel guys made, where they ignored a mechanism 
they already had an implemented something more complicated. The 
mechanism whereby the kernel opens a firmware file and read it 
directly out of the filesystem instead of calling a hotplug helper 
was... I'm just going to gloss over that.

WIFI & Bluetooth devices often use this firmware mechanism.

The wifi and bluetooth _hardware_ is always there though. Transciever 
link toggle is more or less a media insertion/removal event, which is a 
slightly different hotplug mechanism.

Ogres, onions... layers.

And yes I 
agree, it looks a bit ** ugly** seeing the kernel loading a firmware 
file from /lib/firmware  searching it in the root file system w/o 
knowing the state of it during boot ...

They already HAD the hotplug helper mechanism and initramfs! You could 
already CALL A LOADER and some of us had that working and DEPLOYED 
before they built a whole new mechanism for "the kernel reaches out and 
reads a file out of the userspace view of the filesystem from kernel 
space without a process context to do it in like the ELF loader has, 
don't ask me what this means for containers and namespaces..."

(Ok, they wanted to load firmware before PID 1 launched, but they were 
already breaking the drivers into separate probe/init sections so you 
could probe before were started and init after interrupts were started 
and launching PID 1 is the first thing that happens after interrupts are 
enabled (we have a scheduler now, the idle task can fork off PID 1 and 
PID 0 can run pause() in a loop. Except between those two the kernel 
launches a zillion "kernel threads" including the tasklets and deferred 
device initialization and so on...)

It wasn't just awkward, it was unnecessary. (And it DOES NOT SOLVE the 
underlying licensing issue of "this firmware is not gpl, I am bundling 
it into a statically linked initramfs, is this "mere aggregation", let's 
see what a judge has to say!

Meanwhile Bradley is in court ACTIVELY ARGUING that there's no 
difference between GPLv2 and GPLv3 and that the complete lack of any 
copyright holders willing to sign on to his increasingly extreme 
enforcement views isn't a problem because GPLv2 is a contract despite 
the complete absence of things like "privity of contract"... No really:

https://blog.tidelift.com/will-the-new-judicial-ruling-in-the-vizio-lawsuit-strengthen-the-gpl

I got dragged into this recently to spend a day telling a camera "no, 
Bradley's full of it", and yes he flew in to sit at the other end of the 
table for some reason:

https://landley.net/notes-2024.html#24-06-2024

Sigh. There's a reason I do 0BSD these days:

https://landley.net/toybox/license.html

For WIFI and bluetooth I do not 
see a big issue here since I'd prevent putting such features on a 
critical chain by system design in any way since bringing them up and 
(re)connecting external devices is time consuming by nature. Nothing you 
shall need to wait for ...

Except that reconnection mostly happens in software. The _hardware_ 
you're talking to stays connected. It's a resource 
acquisition/allocation problem sure, but closer to partition re-scanning.

*shrug* The asynchronous notifications that something happened behind 
your back come in through similar mechanisms, but if that's ALL we were 
dealing with we wouldn't have needed most of this plumbing.

(Although that was ANOTHER fun failure of the old devfs: /dev/eth0 isn't 
common, thanks to Bill Joy somehow not really understanding unix in 
1979. And of course renaming /dev/hda to /dev/sda is a big deal from a 
compatibility perspective, but the <strike>devfsd2</strike> systemd guys 
deciding that eth0: is now potato03x1: or some such? That's just fine, 
who cares about compatibility with that...)

Compiling in modules vs. loading them later from user space is a trade- 
off. The effect of putting stuff into modules is to keep the kernel 
small which helps you in the "unpacking & loading kernel" phase before 
the kernel is actually started. Having an 1MB unpacked kernel is 
significantly a difference to a 5MB one.

If you can avoid ever loading the module, you may come out ahead. 
(Modulo why are you shipping it then, still needs storage.) Last I 
checked the actual module unloading was still a NOP half the time (the 
memory stays pinned) and marks your kernel "tainted" if you ever 
actually do it, which is not a vote of confidence in the codepath if you 
ask me.

But I had toybox insmod working years ago, the question is toybox 
_modprobe_ is still in pending because modprobe pulls fairly extensive 
shenanigans I am not personally familiar with and have to learn how to 
use before I can implement them, and they just seem like TERRIBLE IDEAS:

https://github.com/landley/toybox/issues/522

On the other hand, my 
experience is that there is lot of overhead (CPU time and IO) loading 
modules from user space. So it really only makes sense, if you have 
drivers to load at a point in time during startup where you have enough 
time and resources left.

The kernel boot process is already fairly heavily asynchronous, which is 
why your shell prompt gets buried with "link up" notifications spamming 
the console after it prints the $ and so on. That's why mkroot's init 
script does echo 3 > /proc/sys/kernel/printk before the exec handoff to 
whatever inherits PID 1 from the setup script:

https://github.com/landley/toybox/blob/0.8.11/mkroot/mkroot.sh#L133

Because if it's a shell, and we don't do that, you won't see the prompt 
under the noise.

I mean it more or less works, it's just... pointless manual 
maintenance of something the kernel does for you in a very small 
amount of code? (In devtmpfs, the /dev node being there means 
something. In a static /dev, it doesn't.)

I agree. There is kind of dynamic device enumeration done by the kernel 
drivers anyway once loaded. Any data structures to devices are build up 
internally. Nothing you can save ...

I spent YEARS convincing the android guys to look at devtmpfs, 
initramfs, container plumbing... (Keep in mind Google bought Android 
Inc. in 2005 and shipped the first phone at the end of 2008, meaning 
their main development effort predated most of this plumbing and they 
had to retrofit it in much later.)  No idea how much impact I had and 
how much they would have eventually done anyway, but the main guy I was 
having those conversations with WAS the android base OS maintainer, 
so... Most recent was probably:

http://lists.landley.net/pipermail/toybox-landley.net/2022-August/029139.html

You'd think the early boot stuff was fairly straightfoward, but I keep 
winding up being the one to manually fix crap like:

https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html

And then YEARS LATER, it's me who has to:

https://lore.kernel.org/lkml/8244c75f-445e-b15b-9dbf-266e7ca666e2@xxxxxxxxxxx/

And then it had to be rewritten to remove my taint:

https://lkml.iu.edu/hypermail/linux/kernel/2311.1/01821.html
https://lkml.iu.edu/hypermail/linux/kernel/2311.2/02938.html

Let alone obvious polishing nonsense like:

https://lkml.iu.edu/hypermail/linux/kernel/1705.0/02640.html

(Which only went in because Andrew Morton picked it up despite Greg KH 
doing his usual stonewalling of literally anything from me. Oh well.)

Anyway, there's a reason I'm not really a kernel developer. When I try 
to engage with them myself, "crickets chirp" is pretty much the GOOD 
outcome...

https://lkml.iu.edu/hypermail/linux/kernel/1707.2/01797.html

Ahem. I'll stop now.

I'm even not sure how devtmpfs can be combined w/ your static devnodes 
you created in any kind of persistent partition.

You could mount your own /tmp and do mdev -s into it. That's what we 
used to do back around 2005:

https://lkml.iu.edu/hypermail/linux/kernel/0512.0/1326.html

(Also, when devtmpfs first went in, if you modified a node (touch, 
chattr, etc) then it wouldn't delete it and your management tool would 
have to delete it via hotplug removal event handling. So you could PIN 
nodes, I was just never clear on why you'd want to. It probably still 
does that?)

And if you even can get 
the kernel accepting your partition to use as /dev,

Kernel doesn't care.

you need to have it 
writeable for the case of dynamics you might need (usb for instance) 
which does not really go well with a read only RFS ... You could ... 
overlay fs ... well no, I think this goes into a wrong direction -> too 
complicated ;)

If you just have a /tmp dir in initramfs with some starting nodes 
initialized via the cpio extractor, and then have something like mdev 
add things on top of that as they're hotplugged, initramfs is inherently 
writeable thus the /tmp dir would be.

There's a race condition where "I booted a device with USB already 
plugged into it before powerup, when is the hotplug event delivered and 
is it before the hotplug handler is registered", which I cared deeply 
about in 2005 and no longer remember the details of. I could try to dig 
them up out of my blog and the busybox/kernel mailing lists if you care?


To summarize from my point of view:

* It's worth talking a bit about the effect of udev and about alternatives

I am not a fan of udev, for reasons that are part technical and part "oh 
those assholes" rant path I'm trying to avoid going down.

* "mdev" is surely worth being named as an potential option besides 
"selective triggering" and "static setup and moving triggers back in time"

* I wouldn't regard mknode as an real alternative in todays system

It still comes up from time to time, usually when initializing 
containers. (Because devtmpfs in containers does NOT give a proper 
container-local view of its namespace.)

Once upon a time, you could use the linux kernel's built in initramfs 
generation plumbing to create a cpio with arbitrary contents by 
providing simple text snippets to supplement their scanner, including a 
/dev/console entry created as a normal user (without running as root!).

But of COURSE the kernel developers removed the ability, and I patched 
it back in (attached), and then went "no, not fighting that fight"...

* In addition I can imagine is "modules loading" vs. "compiling in 
drivers" something which is worth mentioning

There's buckets of domain expertise there and I have like 1/3 of what 
I'd need to be confident there. (I know where to look it up, but have 
never considered it a good thing. Half the point of modules was to 
load/unload drivers for testing without reboots, and I just boot cycle a 
system under qemu or KVM when I can, and boot cycle a physical board 
when I can't because fiddling with modules really doesn't HELP my 
workflow. YMMV...)

The main other reason modules persist is out-of-tree drivers, usually 
not under GPL, which have been under systematic attack for well over a 
decade and the people still doing it have large teams writing shim code.

Most "let's use modules" decisions _since_ then boil down to either

1) "this is a generic PC hardware distro and I have no idea what 
hardware will be on there, and building every possible module into the 
kernel wastes a couple dozen megabytes of RAM on a system"

2) This mechanism exists, there must be a reason, therefore I should 
definitely use it because it's there.

(They built _mechanisms_ to prevent you from upgrading modules without 
upgrading the kernel they plug into. Note that the description of 
CONFIG_MODVERSIONS says that WITHOUT it you can't have even slight 
version skew. That's without MODULE_SIG and MODULE_SRCVERSION_ALL and so 
on.)

By the way, you can provide "module arguments" on the kernel command 
line, write to things like /sys/module/psmouse/parameters/rate after the 
driver's up...

* Once I've access to the wiki, I can try to put these ideas into an 
initial structure filled up w/ info we discussed in this thread

Marko

Good luck.

You know what we REALLY need a new version of? A rewrite of:

https://landley.net/kdocs/mirror/lki-single.html

With sections for each architecture. (And if you tried to write one, 
you'd hate Raspberry Pi as much as I do! Although 
https://forums.raspberrypi.com/viewtopic.php?t=357536 is extremely 
promising, and a far sight better than 
https://github.com/christinaa/rpi-open-firmware ever got to. Although I 
haven't really dug into the details of what's still proprietary black 
box spyware subtly bugging your board with "system management mode" 
hijacks, and what they actually managed to work around despite not 
having hardware documentation for broadcom chips...)

Rob
From: Rob Landley <rob@xxxxxxxxxxx>
Date: Fri, 06 Oct 2023 02:56:19 -0500
Subject: [PATCH] Add gen_initramfs.sh -O

Add a -O option to output the list instead of the archive. (You can
specify -o after -O to produce both.)

For 15 years gen_initramfs_list.sh produced a text output format that
other things consumed and modified and fed back to the kernel, then
the script changed to consume the list internally and produce the cpio
archive directly. (Why they didn't just change gen_init_cpio.c to traverse
directories itself if they were going to take away the ability to filter
the list is an open question. Maybe it could handle filenames with spaces
in them if they'd done that? And why "squash" in-band signalling instead of
the -1 I submitted, which doesn't conflict with existing users because
integers aren't valid usernames...)

Signed-off-by: Rob Landley <rob@xxxxxxxxxxx>
---

 usr/gen_initramfs.sh |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/usr/gen_initramfs.sh b/usr/gen_initramfs.sh
index 14b5782f961a..8f75988a5799 100755
--- a/usr/gen_initramfs.sh
+++ b/usr/gen_initramfs.sh
@@ -15,6 +15,7 @@ usage() {
 cat << EOF
 Usage:
 $0 [-o <file>] [-l <dep_list>] [-u <uid>] [-g <gid>] {-d | <cpio_source>} ...
+	-O <file>      Output annotated file list instead of archive
 	-o <file>      Create initramfs file named <file> by using gen_init_cpio
 	-l <dep_list>  Create dependency list named <dep_list>
 	-u <uid>       User ID to map to user ID 0 (root).
@@ -206,6 +207,15 @@ while [ $# -gt 0 ]; do
 			echo "deps_initramfs := \\" > $dep_list
 			shift
 			;;
+		"-O")	# Output annotated file list
+			unset output
+			trap - EXIT
+			[ "$1" = "-" ] &&
+				cpio_list="/dev/stdout" ||
+				cpio_list="$1"
+			shift
+			;;
+
 		"-o")	# generate cpio image named $1
 			output="$1"
 			shift