On Wed, Aug 24, 2022 at 06:14:51PM +0100, Daniel P. Berrangé wrote: > On Wed, Aug 24, 2022 at 10:56:22AM -0500, Jonathon Jongsma wrote: > > On 8/24/22 2:09 AM, Erik Skultety wrote: > > > On Tue, Aug 23, 2022 at 12:43:03PM -0500, Jonathon Jongsma wrote: > > > > Openstack developers reported that newly-created mdevs were not > > > > recognized by libvirt until after a libvirt daemon restart. The source > > > > of the problem appears to be that when libvirt gets the udev 'add' > > > > event, the sysfs tree for that device might not be ready and so libvirt > > > > waits 100ms for it to appear (max 100 waits of 1ms each). But in the > > > > OpenStack environment, the sysfs tree for new mediated devices was > > > > taking closer to 250ms to appear and therefore libvirt gave up waiting > > > > and didn't add these new devices to its list of nodedevs. > > > > > > > > By changing the wait time to 1 second (max 100 waits of 10ms each), this > > > > should provide enough time to enable these deployments to recognize > > > > newly-created mediated devices, but it shouldn't increase the delay for > > > > more traditional deployments too much. > > > > > > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2109450 > > > > > > > > Signed-off-by: Jonathon Jongsma <jjongsma@xxxxxxxxxx> > > > > --- > > > > > > > > Alternatively, we could switch to triggering off of the udev 'bind' event > > > > rather than the 'add' event, but I wasn't able to convince myself that this > > > > would result in 100% compatible behavior, so this felt like the safest > > > > solution. If others can convince me that switching to 'bind' is safe, I can > > > > re-submit this patch. > > > > > > Is there a guarantee that the filesystem tree is ready by the time the event > > > arrives? I remember back in the day when I implemented this, this was even > > > discussed on the kernel list and the outcome was that each application needs to > > > sort this out on its own hinting that at least at that time there wasn't > > > any other way to do this reliably? Has something changed in the meantime? > > > > > > Erik > > > > > > > I'm afraid I don't actually know if anything has changed in the kernel in > > this area. That's basically the reason that I proposed the approach that I > > did. But I do know that in the bug referenced, the 'bind' event comes about > > 250ms later than the 'add' event. I'm not sure if the filesystem tree is > > necessarily ready on 'bind', but the fact that it is 250ms later means that, > > at minimum, there's a significantly better chance that it is ready by that > > point than at the time of 'add'. > > Looking at the commit message for 'bind' events > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1455cf8dbfd0 > > [quote] > Also, many drivers create additional driver-specific device attributes > when binding to the device, to provide userspace with additional controls. > The new events allow userspace to adjust these driver-specific attributes > without worrying that they are not there yet. > [/quote] > > My reading of this is that we should NOT assume sysfs files exist > until we've seen the 'bind' event. > > IOW, the reason we have this problem with sysfs is precisely because > we hooked into 'add' instead of 'bind'. The sleep papers over the > mistake, while using 'bind' instead would address the root cause. >From the quote you posted I don't see how we're guaranteed that the sysfs tree is ready and it looks like each driver has too much room for interpreting 'bind' whichever way they want. That said, I agree that switching to listening to 'bind' instead of 'add' is an improvement, however we still may need to have a timeout in place anyway as I still don't see any guarantees from the NVIDIA driver that the sysfs tree will be ready by then. > > NB, if we use 'bind' we need to double check all our supported > platforms are targetting kerenl >= 4.14 already. They are, the oldest we support is 4.18 on AlmaLinux 8. Erik