Re: [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2021-11-08 at 11:30 -0600, Benjamin Marzinski wrote:
> On Mon, Nov 08, 2021 at 04:55:37PM +0000, Martin Wilck wrote:
> > On Mon, 2021-11-08 at 10:29 -0600, Benjamin Marzinski wrote:
> > > On Fri, Nov 05, 2021 at 11:20:01PM +0000, Martin Wilck wrote:
> > > > > 1: udev hasn't gotten an event for a device
> > > > 
> > > > I don't think we can detect this without listening for kernel
> > > > uevents.
> > > > And even if we listen to them, we could have missed some. udevd
> > > > doesn't
> > > > have an API for it, either, AFAIK.
> > > 
> > > Isn't this the most common INIT_PARTIAL case, where we are
> > > waiting
> > > for
> > > the coldplug uevent? If there is no database file when we are
> > > processing
> > > the device, we are in this state. Correct? 
> > 
> > Not necessarily. udev may have got an event, but not have finished
> > processing it, or failed to process it entirely (e.g. because of a
> > timeout, your case 2.). When udevd sees an "add" or "change" event
> > for
> > a device for the first time, creating the db entry is the last
> > action
> > it takes. During coldplug, udevd will receive a lot of events
> > almost
> > simultaneously, but it may take considerable time until it has
> > processed them all.
> 
> Fair enough. So if the first uevent hasn't completed already
> successfully, it's gonna be hard to know why.

I just realized that what I said was wrong.

The worker does this (udev v246):

worker_process_device()
  udev_event_execute_rules()
    udev_rules_apply_to_event()
    update_devnode() /* symlinks etc */
    device_ensure_usec_initialized() /* set USEC_INITIALIZED */
    device_update_db() /* here the db file is (re)written */
       /* from this point on, 
          libudev will consider the device initialized */
    update_devnode() /* again */
    device_set_is_initialized() /* sets is_initialized property */
  udev_event_execute_run() /* Here RUN is executed */
  if (udev_event->inotify_watch)
     device_update_db()  /* again, this is what I was referring to */
device_monitor_send_device() /* send "udev" uevent to listeners */

If a worker is killed because of timeout, or exits with error, udevd
will receive SIGCHLD and delete the db entry for the device in
question.

>  
> > > > > 2: udev got an event but timed out or failed while processing
> > > > > it
> > > > 
> > > > This would be the USEC_INITIALIZED case, IMO
> > > 
> > > If udev has, in the past, successfully processed an event for a
> > > device,
> > > but fails while processing a later uevent, it doesn't keep th
> > > data
> > > from
> > > the previous event. So it could lose the uid_attribute. But the
> > > database file should still exist. Correct?
> > 
> > That's true. But we can't do anything about it. libudev will return
> > what the database currently contains, which is the content from the
> > last successfuly processed "add" or "change" uevent, whether or not
> > other uevents are in the queue or being processed. I don't think
> > this
> > scenario matters in the current discussion about partially
> > initialized
> > paths. This is the "changed wwid" scenario which I think we handle
> > quite nicely today already. Or am I misunderstanding?
> 
> If both events occurred before multipathd started up, then this
> wouldn't
> simply be a "changed WWID".  The hope is to be able to reliably
> distinguish this from case 3, where the data from udev is fine, but
> the
> uid_attribute still isn't there.

I'm not quite getting this. Are you talking about the case where one,
or both, uevents are lost? 

IMO we can use udev_device_get_is_initialized() as test. AFAICS it's
basically equivalent to USEC_INITIALIZED being set. If that function
returns true and uid_attribute is not set, re-triggering an uevent
makes relatively little sense to me. The likelihood that the next even
will yield a different result is pretty close to zero.

If if the first uevent is handled successfully but the second times
out, udevd will delete the db file (see above), but no uevent will be
sent to listeners. So multipathd will be ignorant of the lost / timed
out event. There's nothing we can do about that.

Regards
Martin








> 
> -Ben
> 
> > Cheers,
> > Martin
> 


--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux