On Wed, 25 Apr 2018 17:42:11 +0200 Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > On Wed, Feb 14, 2018 at 06:20:35PM +0100, Cornelia Huck wrote: > > [cc:ing Greg for his opinion on this; retaining quoting for context] > > Ick, just found this in my inbox, sorry for the delay... > > > > > On Tue, 13 Feb 2018 17:15:02 -0700 > > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > > > On Tue, 13 Feb 2018 14:09:01 +0100 > > > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > > > > > On Mon, 12 Feb 2018 14:20:57 -0700 > > > > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > > > > > > > On Fri, 9 Feb 2018 11:27:16 +0100 > > > > > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > > > > > > > > > The registration code first registers the mdev device, and then > > > > > > proceeds to populate sysfs. An userspace application that listens > > > > > > for the ADD uevent is therefore likely to look for sysfs entries > > > > > > that have not yet been created. > > > > > > > > > > > > The canonical way to fix this is to use attribute groups that are > > > > > > registered by the driver core before it sends the ADD uevent; I > > > > > > unfortunately did not find a way to make this work in this case, > > > > > > though. > > > > > > > > > > > > An alternative approach is to suppress uevents before we register > > > > > > with the core and generate the ADD uevent ourselves after the > > > > > > sysfs infrastructure is in place. > > > > > > > > > > > > Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx> > > > > > > --- > > > > > > > > > > > > This feels like a band-aid, but I can't figure out how to handle creating > > > > > > attribute groups when there's a callback in the parent involved. > > > > > > > > > > > > This should address the issue with libvirt's processing of mdevs raised in > > > > > > https://www.redhat.com/archives/libvir-list/2018-February/msg00023.html > > > > > > - although libvirt will still need to deal with older kernels, of course. > > > > > > > > > > > > Best to consider this an untested patch :) > > > > > > > > > > I agree, this feels like a band-aide. If every device in the kernel > > > > > needs to suppress udev events until until some key component is added, > > > > > that suggests that either udev is broken in general or not being used > > > > > as intended. > > > > > > > > I think udev is working exactly as designed - it's more a problem of > > > > when the kernel is sending what kind of notification to userspace, and > > > > the particular issue here is how the code sending the event (driver > > > > core) and the code assembling part of the user interface (mdev) > > > > interact. > > > > > > > > > Zongyong submitted a different proposal to fix this > > > > > here[1]. That proposal seems a bit more sound and has precedence > > > > > elsewhere in the kernel. What do you think of that approach? We > > > > > don't need both afaict. Thanks, > > > > > > > > > > Alex > > > > > > > > > > [1]https://patchwork.kernel.org/patch/10196197/ > > > > > > > > Zongyong's patch is sending an additional CHANGE uevent, and I agree > > > > that doing both does not make sense. However, I think the CHANGE uevent > > > > is not quite suitable in this case, and delaying the ADD uevent is > > > > better. > > > > > > > > [Warning, the following may be a bit rambling.] > > > > > > > > The Linux driver model works under the assumption that any device is > > > > represented as an in-kernel object that exposes information and knobs > > > > through sysfs. As long as the device exists, userspace can poke at the > > > > sysfs entries and retrieve information or configure things. > > > > > > > > The idea of the 'ADD' uevent is basically to let userspace know that > > > > there is now a new device with its related sysfs entries, and it may > > > > look at it and configure it. IOW, if I (as a userspace application) get > > > > the ADD uevent, I expect to be able to look at the device's sysfs > > > > entries and find all the files/directories that are usually there, > > > > without having to wait. > > > > > > > > This expectation is broken if a device is first registered with the > > > > driver core (generating the ADD uevent) and the driver adds sysfs > > > > attributes later. To fix this, the driver core added a way to specify > > > > default attribute groups for the device, which are registered by the > > > > driver core itself before it generates the ADD uevent. Unfortunately, I > > > > did not see a way to do this here (which does not mean there isn't). > > > > The alternative was to prevent the driver core from sending the ADD > > > > uevent and do it from the mdev code when it was ready. > > > > > > > > The 'CHANGE' uevent, on the other hand, tells userspace that something > > > > has changed for the device (that already existed). I (as a userspace > > > > application) would expect to see it if, for example, the information > > > > exposed via sysfs has changed, or maybe even if new, optional, entries > > > > have appeared and I might want to rescan. With Zongyong's patch, > > > > userspace gets the CHANGE uevent for something that was already > > > > expected to be there, and is now _really_ there. It does give userspace > > > > an indication that it can now work with the device (which certainly > > > > improves things), but I would prefer to get rid of the too-early uevent > > > > completely so that userspace does not get notified at all before the > > > > device is completely present in sysfs. > > > > > > > > So, in short, my patch does 'don't tell userspace until we're really > > > > done', and Zongyong's patch does 'tell userspace again when we're > > > > really done'. > > > > > > This all sounds reasonable, but don't we have this synchronization > > > problem _everywhere_? I apologize for referencing this bug because it's > > > not public (https://bugzilla.redhat.com/show_bug.cgi?id=1376907) but > > > the gist of it is that soft-unplugging PCI devices using the remove > > > entry in sysfs and re-adding with rescan sysfs entry results in libvirt > > > seeing the ADD uevent before the PCI config attribute is created and it > > > balks on the device. > > > > Yikes. So it seems that _any_ PCI device is incomplete when the ADD > > uevent is sent? Or does this just apply to an unfortunately large > > number of drivers? > > What do you mean by "incomplete"? The device is added to the bus at > that point in time, yes. BUT, if a driver decides to add their own > attributes to the sysfs node, then that will happen afterward, unless > the bus has properly set that up (there are default attributes it can > use for that type of thing.) > > It's a common problem that has been there since the beginning, and now > there's a new uevent that is "KOBJ_BIND" that is emitted when the driver > is finished being "bound" to the device. Is that what you are looking > for here? Yes, that sounds exactly like the right thing (libvirt had been looking for the config pci device attribute). > > > > So at the PCI core we have this same issue and > > > developers are saying that there's no guarantee that sysfs entries > > > won't be added and removed at any time in the lifecycle of the device > > > and it's not the kernel's responsibility to provide that > > > synchronization. > > > > I think this is mixing up two things: > > - sysfs entries that are there by default. These not being in place > > when the ADD uevent is sent sounds broken. > > Yes it is, fix that in your bus code, default device attributes should > always be set up before ADD happens. Driver specific ones will be there > by the time BIND happens. In this case (mdev), the mediated device is interacting with the physical device as well (including specific attributes). I looked at the code, but soon was lost in a maze of twisty passages, all alike. Suppressing the uevent was a quick fix, but likely not the best one. > > > - Optional sysfs entries that might change. There may be a case for > > those to be added/removed later on (probably paired with a CHANGE > > uevent.) > > CHANGE is pretty rare, and best used for things that are polled. Or > major system events, like docking station changes. Not for things that > happen all the time (like power state changes.) > > does that help? Yes, thanks!