On Wed, Aug 24, 2022 at 10:56:22AM -0500, Jonathon Jongsma wrote: > On 8/24/22 2:09 AM, Erik Skultety wrote: > > On Tue, Aug 23, 2022 at 12:43:03PM -0500, Jonathon Jongsma wrote: > > > Openstack developers reported that newly-created mdevs were not > > > recognized by libvirt until after a libvirt daemon restart. The source > > > of the problem appears to be that when libvirt gets the udev 'add' > > > event, the sysfs tree for that device might not be ready and so libvirt > > > waits 100ms for it to appear (max 100 waits of 1ms each). But in the > > > OpenStack environment, the sysfs tree for new mediated devices was > > > taking closer to 250ms to appear and therefore libvirt gave up waiting > > > and didn't add these new devices to its list of nodedevs. > > > > > > By changing the wait time to 1 second (max 100 waits of 10ms each), this > > > should provide enough time to enable these deployments to recognize > > > newly-created mediated devices, but it shouldn't increase the delay for > > > more traditional deployments too much. > > > > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2109450 > > > > > > Signed-off-by: Jonathon Jongsma <jjongsma@xxxxxxxxxx> > > > --- > > > > > > Alternatively, we could switch to triggering off of the udev 'bind' event > > > rather than the 'add' event, but I wasn't able to convince myself that this > > > would result in 100% compatible behavior, so this felt like the safest > > > solution. If others can convince me that switching to 'bind' is safe, I can > > > re-submit this patch. > > > > Is there a guarantee that the filesystem tree is ready by the time the event > > arrives? I remember back in the day when I implemented this, this was even > > discussed on the kernel list and the outcome was that each application needs to > > sort this out on its own hinting that at least at that time there wasn't > > any other way to do this reliably? Has something changed in the meantime? > > > > Erik > > > > I'm afraid I don't actually know if anything has changed in the kernel in > this area. That's basically the reason that I proposed the approach that I > did. But I do know that in the bug referenced, the 'bind' event comes about > 250ms later than the 'add' event. I'm not sure if the filesystem tree is > necessarily ready on 'bind', but the fact that it is 250ms later means that, > at minimum, there's a significantly better chance that it is ready by that > point than at the time of 'add'. In that case I'd accept this solution over bind since on a loaded system you neither have a guarantee that the filesystem tree is ready by the time bind is delivered nor that bind cannot be delayed for significantly longer period (less likely). So, from my POV: Reviewed-by: Erik Skultety <eskultet@xxxxxxxxxx> to the patch as is. Regards, Erik