On Mi, 30.09.20 13:57, Alan Perry (alanp@xxxxxxxxxxxxx) wrote: > > > On 9/23/20 9:29 AM, Lennart Poettering wrote: > > On Di, 22.09.20 10:06, Alan Perry (alanp@xxxxxxxxxxxxx) wrote: > > > > > > > device add events will get stuck at the probe step. > > > > "Get stuck"? What does that mean? What is it actually doing? What does > > > > a stack trace say? Anything in the logs? > > > When this happens, the last thing seen in the log for those devices is the > > > probe ("probe /dev/mmcblk0<part> raid offset=0"). > > This debug log message is generated by udev-builtin-blkid.c, right > > after opening the block device, and right before issuing the probe, > > i.e. reading the fs label/partition table signatures off disk. If > > things hang there, and the blkid prober worker process freezes then > > this really looks like a hw/driver problem, i.e. IO access from the > > block device just hangs. > > > > It does seem to be a hw/driver problem. From what I have seen searching the > web, this seems to be something that sometimes happens with eMMC devices. > > In our experience, the problem resolves itself and subsequent reads and > probes succeed. However, the systemd job is still around, hung, and stopping > boot from completing. I think that changing udev-builtin-blkid to be able to > timeout and end the job gracefully when this happens is the right thing to > do here. But what is a suitable timeout and what does a graceful exit here > look like? udev kills workers after a while. You can configure that with event_timeout= in udev.conf. Defaults to 2min. But note that disk IO sleeps in the kernel usually are non-interruptible, i.e. you cannot kill processes hanging in them. Hence, YMMV. Driver bugs are kernel bugs. Fix them in the kernel, working around them in userspace is ultimately never going to make anyone happy. Lennart -- Lennart Poettering, Berlin _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel