Hi Martin: Thanks for your reply. > verify_paths() would detect this. We do call verify_paths() in > coalesce_paths() before calling domap(), but not immediately before. > Perhaps we should move the verify_paths() call down to immediately > before the domap() call. That would at least minimize the time window > for this race. It's hard to avoid it entirely. The way multipathd is > written, the vecs lock is held all the time during coalesce_paths(), > and thus no uevents can be processed. We could also consider calling > verify_paths() before *and* after domap(). Can calling verify_paths() before *and* after domap() deal this entirely? > Was this a map creation or a map reload? Was the map removed after the > failure? Do you observe the message "ignoring map" or "removing map"? > > Do you observe a "remove" uevent for sdi? This was a map reload but sdi was not in old map. The "removing map" was observed. The "remove" uevent for sdi was not observed here. > I wonder if you'd see the issue also if you run the same test without > the "multipath -F; multipath -r" loop, or with just one. Ok, one > multipath_query() loop simulates an admin working on the system, but 2 > parallel loops - 2 admins working in parallel, plus the intensive > sequence of actions done in multipathd_query at the same time? The > repeated "multipath -r" calls and multipathd commands will cause > multipathd to spend a lot of time in reconfigure() and in cli_* calls > holding the vecs lock, which makes it likely that uevents are missed or > processed late. As you said, there were lots of cli_* calls but no uevent when error caused. And after finishing them, hundreds of uevent will be found (for example ,"Forwarding 201 uevents" in log). > Don't get me wrong, I don't argue against tough testing. But we should > be aware that there are always time intervals during which multipathd's > picture of the present devices is different from what the kernel sees. What you said is very reasonable. When this problem was found, I think it is difficult to solve that entirely, while it is hard to happen. Well, I will discuss the rationality of test scripts with testers. > There's definitely room for improvement in multipathd wrt locking and > event processing in general, but that's a BIG piece of work. Thanks again! Regards Lixiaokeng -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel