Re: [PATCH v8 0/4] pci hotplug tracking

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Thu, 2 Nov 2023 08:12:35 -0400

On Thu, Nov 02, 2023 at 03:00:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.23 14:31, Michael S. Tsirkin wrote:
> > On Thu, Oct 05, 2023 at 12:29:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > Hi all!
> > > 
> > > Main thing this series does is DEVICE_ON event - a counter-part to
> > > DEVICE_DELETED. A guest-driven event that device is powered-on.
> > > Details are in patch 2. The new event is paried with corresponding
> > > command query-hotplug.
> > 
> > Several things questionable here:
> > 1. depending on guest activity you can get as many
> >     DEVICE_ON events as you like
> 
> No, I've made it so it may be sent only once per device

Maybe document that?

> > 2. it's just for shpc and native pcie - things are
> >     confusing enough for management, we should make sure
> >     it can work for all devices
> 
> Agree, I'm thinking about it
> 
> > 3. what about non hotpluggable devices? do we want the event for them?
> > 
> 
> I think, yes, especially if we make async=true|false flag for device_add, so that successful device_add must be always followed by DEVICE_ON - like device_del is followed by DEVICE_DELETED.
> 
> Maybe, to generalize, it should be called not DEVICE_ON (which mostly relate to hotplug controller statuses) but DEVICE_ADDED - a full counterpart for DEVICE_DELETED.
> 
> > 
> > I feel this needs actual motivation so we can judge what's the
> > right way to do it.
> 
> My first motivation for this series was the fact that successful device_add doesn't guarantee that hard disk successfully hotplugged to the guest. It relates to some problems with shpc/pcie hotplug we had in the past, and they are mostly fixed. But still, for management tool it's good to understand that all actions related to hotplug controller are done and we have "green light".

what does "successfully" mean though? E.g. a bunch of guests will not
properly show you the device if the disk is not formatted properly.

> 
> Recently new motivation come, as I described in my "ping" letter <6bd19a07-5224-464d-b54d-1d738f5ba8f7@xxxxxxxxxxxxxx>, that we have a performance degradation because of 7bed89958bfbf40df, which introduces drain_call_rcu() in device_add, to make it more synchronous. So, my suggestion is make it instead more asynchronous (probably with special flag) and rely on DEVICE_ON event.

This one?

commit 7bed89958bfbf40df9ca681cefbdca63abdde39d
Author: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
Date:   Tue Oct 6 14:38:58 2020 +0200

    device_core: use drain_call_rcu in in qmp_device_add
    
    Soon, a device removal might only happen on RCU callback execution.
    This is okay for device-del which provides a DEVICE_DELETED event,
    but not for the failure case of device-add.  To avoid changing
    monitor semantics, just drain all pending RCU callbacks on error.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
    Suggested-by: Stefan Hajnoczi <stefanha@xxxxxxxxx>
    Reviewed-by: Stefan Hajnoczi <stefanha@xxxxxxxxxx>
    Message-Id: <20200913160259.32145-4-mlevitsk@xxxxxxxxxx>
    [Don't use it in qmp_device_del. - Paolo]
    Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index e9b7228480..bcfb90a08f 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -803,6 +803,18 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
         return;
     }
     dev = qdev_device_add(opts, errp);
+
+    /*
+     * Drain all pending RCU callbacks. This is done because
+     * some bus related operations can delay a device removal
+     * (in this case this can happen if device is added and then
+     * removed due to a configuration error)
+     * to a RCU callback, but user might expect that this interface
+     * will finish its job completely once qmp command returns result
+     * to the user
+     */
+    drain_call_rcu();
+
     if (!dev) {
         qemu_opts_del(opts);
         return;



So maybe just move drain_call_rcu under if (!dev) then and be done with
it?

-- 
MST
_______________________________________________
Devel mailing list -- devel@xxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx