Hi again,
I have found out some more about this issue:
When dm-init succeeds in finding /dev/mmcblk0p3, it actually only finds
/dev/mmcblk0 and NOT the partition block device (see behavior in
devt_from_devname).
Subsequently, when dm-verity looks up the data device (/dev/mmcblk0p3)
via dm_get_device, the following call stack is executed:
dm_get_device
dm_get_table_device
open_table_device
blkdev_get_by_dev
blkdev_get_no_open
blkdev_get_no_open actually returns -ENXIO because the partition device
is not available yet. That error then propagates back.
A naive fix that works for me follows:
diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index 2a71bcdba92d1..1e0867e5e9d95 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -294,10 +294,14 @@ static int __init dm_init_init(void)
for (i = 0; i < ARRAY_SIZE(waitfor); i++) {
if (waitfor[i]) {
dev_t dev;
+ struct block_device *bdev;
DMINFO("waiting for device %s ...", waitfor[i]);
while (early_lookup_bdev(waitfor[i], &dev))
fsleep(5000);
+ while (!(bdev = blkdev_get_no_open(dev)))
+ fsleep(5000);
+ blkdev_put_no_open(bdev);
}
}
This is probably not ideal but I lack knowledge in the dm world to come
up with a better fix. Suggestions welcome.
Best regards,
Sven
On 4/9/24 4:23 PM, Christoph Hellwig wrote:
On Tue, Apr 09, 2024 at 01:33:13PM +0200, Sven wrote:
Hi,
I'm using dm-init to set up a dm-verity rootfs, i.e. my kernel command line
includes:
root=/dev/dm-0
dm-mod.waitfor=/dev/mmcblk0p3
dm-mod.create="rootfs,,0,ro,0 614400 verity 1 /dev/mmcblk0p3 /dev/mmcblk0p3
1024 4096 307200 76801 sha256 HASH HASH"
Occasionally, my device refuses to boot with the following errors:
device-mapper: table: 254:0: verity: Data device lookup failed (-ENXIO)
device-mapper: ioctl: error adding target to table
There appears to be a race condition somewhere. This problem started to
appear when updating from kernel version 5.15.148 to 6.6.22. Looking at the
changes between the versions, this patch seems relevant:
https://lore.kernel.org/all/20230531125535.676098-19-hch@xxxxxx/
It appears that early_lookup_bdev in dm_init_init would succeed even if the
partition block device is not there yet but the parent block device is.
Could that explain why the subsequent device lookup for the partition
fails?
-ENXIO isn't really an error that should come from the early block
device lookup. On the other hand mmc_blk_open can return -ENXIO
and block device open return values end up here. Can you throw
a printk or trace probe of your choice into mmc_blk_open to see if
that's the culprit?