On Wed, 2024-12-11 at 21:20 +0100, Martin Wilck wrote: > On Wed, 2024-12-11 at 12:09 -0500, Benjamin Marzinski wrote: > > > > I'm not actually worried about the kernel so much as libdevmapper. > > It > > is > > not designed for multi-threaded processes, and that has bitten us > > in > > the > > past. For intance, it's why we don't delete devices in > > dmevent_loop() > > on > > libdevmapper errors. dm_get_events() just waits and retries if > > getting > > the device list fails, and for each device, it calls dm_is_mpath > > and > > will only delete a device on DM_IS_MPATH_NO, which is what I > > suggested > > for the cleanup function. > > > > I'm pretty sure we've handled all of the known issues here, with > > fixes > > like: > > 02d4bf07 ("libmultipath: protect racy libdevmapper calls with a > > mutex") > > 34e01d2f ("multipath-tools: don't call dm_lib_release() any more") > > > > I'd rather not risk having missed some issue that could cause a > > temporary error in a function that we call every couple of seconds > > (almost always unnecessarily). > > Ok, getting it. I thought that an error in DM_TABLE_STATUS must > almost > neccessarily mean -ENXIO (from the kernel pov), which would mean that > some external entity removed the device, and that we should act as if > someone had used the "remove map" CLI command. But I didn't think > about > libdevmapper. But will libdevmapper return ENXIO if it's somehow interally confused? I don't think so. I believe that if we see this error code, removing the map is the right thing to do. I consider adding a patch on top of the v4 series that does this. If you reject it, fine :-) Regards, Martin