On Tue, Nov 12, 2024 at 08:46:52AM +0100, Martin Wilck wrote: > On Mon, 2024-11-11 at 12:17 -0500, Benjamin Marzinski wrote: > > On Sat, Nov 09, 2024 at 12:03:33AM +0100, Martin Wilck wrote: > > > On Fri, 2024-11-08 at 17:08 -0500, Benjamin Marzinski wrote: > > > > > > > > Given that the code already does the right thing without needing > > > > any > > > > changes to handle tableless maps makes me feel like just removing > > > > the > > > > final patch is the best idea. > > > > > > If we do this, do we still need the special case for DMP_BAD_DEV? > > > IMO > > > we just need to make sure that dm_get_maps() doesn't return an > > > error if > > > it encounters an empty map. > > > > Depends. If we fail attempting to create a device, we could have > > raced > > with something else attempting to create it. Just checking for > > dm_map_present() before we remove the device won't distinguish > > between > > these two cases at all. > > I would like to keep things as simple as possible. > > The main mistake that I made in bf3a4ad ("libmultipath: simplify > dm_get_maps()") was that I'd changed dm_get_maps() such that returned > error if anything went wrong for a single map. That was wrong, it > should just skip the map in question (IMO that even includes the > DMP_ERR case, but that's up to discussion). Just changing the behavior > of dm_get_maps() would fix the regressions reported on GitHub (correct > me if I'm wrong). > > Wrt libmp_mapinfo: > > >From my PoV, there's no difference between "has no table" and "has no > targets". There _is_ a difference between "no targets" and "multiple > targets" though, as the former can be reloaded as a proper multipath > table whereas the latter cannot. So these two cases must be treated > differently. > > I think we should actually introduce a new return code. > > - DMP_NOT_FOUND should actually mean that the given map does not exist > at all > - DMP_NO_MATCH should mean that it has targets that don't match > (either multiple targets, or something non-multipath), or that the UUID > doesn't match (if MAPINFO_CHECK_UUID is set). > - We should have another rc code for "exists but is empty", but that > should be called DMP_EMPTY rather than DMP_BAD_DEV. This sounds reasonable. I can redo my patch for that on top of the set you just posted. > > On the other hand, there is still a window where the program that we > > raced with is in the middle of creating the device, but it doesn't > > have > > a live table yet. In this case, even the DMP_BAD_DEV code won't be > > able > > to distinguish between a map that didn't get created correctly and > > one > > that is in the process of getting created. It would reduce the window > > where we could accidentally delete a working device, however. > > I wouldn't bother about this kind of race too much. If multipathd just > ignores this kind of maps as it used to, it won't do any harm AFAICT. > We shouldn't alter this behavior, IOW multipathd should neither remove > these maps nor add them to its internal tables unless they become > populated with paths. > > If anything, we should finally get down to delegating map creation from > multipath to multipathd. The issue is that if creating a multipath device fails, the current code will only delete any leftover device it finds if that device is already set up correctly (since dm_flush_map_nosync() won't delete the map if it doesn't have a multipath table). This means we only delete devices in the case where we did race and something else managed to set the device up, which is exactly what we don't want to do. So, I'm going to repost a version of my ("libmultipath: fix removing device after failed creation") patch because anything is better than what we do right now, which is just wrong. > > Since it doesn't add much complexity, I lean towards keeping it, but > > it > > is one more state we need to handle on returns from libmp_mapinfo(), > > so > > I'm open to an argument that it's not worth it. > > I agree, comments above. > > > > > The only issue with the current code is that if you have a device > > > > with > > > > no table with the UUID of a multipath device but a different name > > > > than > > > > that multipath device should have, multipath will try to create a > > > > new > > > > device instead of reload it, and this fails, since the UUID is > > > > already > > > > in use. > > > > > > That should be handled in coalesce_paths(). It's basically the > > > scenario > > > that occurs if we enable or disable user_friendly names. > > > > Except that we have a mpp for the other device if it has a table. If > > the > > device doesn't have a table, then a mpp isn't created, so > > coalesce_paths() doesn't track it. We could add code to create mpps > > for > > these tableless devices, but it's another trade off of added > > complexity > > to handle a rare corner case, and I'm not sure this one is worth it. > > What I meant is that if there are matching path devices, > coalesce_paths() will create an mpp and reload the table. Otherwise, > the table will continue lurking around without being tracked, which > doesn't do harm even if it may not seem optimal. If a matching path is > added, again, uev_add_path() should reload the map (to be tested). > > I think that behavior is sane enough for a corner case like this. Even if there are matching path devices, if there is a dm device that has a different name than the device we are trying to create, but the same UUID, and no table (and thus, not in the current list of mpps), multipath will try to create a new device instead of reload it, and it will fail. I'm trying to avoid that. But this issue has been around forever, and AFAIK has never actually caused a problem, so fixing it isn't worth any real added complexity. -Ben > Martin