Re: [v4 2/3] vfio_register_notifier: also register on the group notifier

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 16 Nov 2016 12:45:14 -0700

On Thu, 17 Nov 2016 00:42:48 +0530
Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:

> On 11/16/2016 11:18 PM, Alex Williamson wrote:
> > On Wed, 16 Nov 2016 16:14:24 +0530
> > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> >   
> >> On 11/16/2016 3:07 PM, Jike Song wrote:  
> >>> On 11/16/2016 05:14 PM, Kirti Wankhede wrote:    
> >>>> On 11/16/2016 9:13 AM, Alex Williamson wrote:    
> >>>>> On Wed, 16 Nov 2016 11:01:37 +0800
> >>>>> Jike Song <jike.song@xxxxxxxxx> wrote:
> >>>>>    
> >>>>>> On 11/16/2016 07:11 AM, Alex Williamson wrote:    
> >>>>>>> On Tue, 15 Nov 2016 19:35:46 +0800
> >>>>>>> Jike Song <jike.song@xxxxxxxxx> wrote:
> >>>>>>>       
> >>>>>>>> The user of vfio_register_notifier might care about not only
> >>>>>>>> iommu events but also vfio_group events, so also register the
> >>>>>>>> notifier_block on vfio_group.
> >>>>>>>>
> >>>>>>>> Cc: Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>
> >>>>>>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> >>>>>>>> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> >>>>>>>> Signed-off-by: Jike Song <jike.song@xxxxxxxxx>
> >>>>>>>> ---
> >>>>>>>>  drivers/vfio/vfio.c | 4 ++++
> >>>>>>>>  1 file changed, 4 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> >>>>>>>> index b149ced..2c0eedb 100644
> >>>>>>>> --- a/drivers/vfio/vfio.c
> >>>>>>>> +++ b/drivers/vfio/vfio.c
> >>>>>>>> @@ -2065,6 +2065,8 @@ int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
> >>>>>>>>  	else
> >>>>>>>>  		ret = -ENOTTY;
> >>>>>>>>  
> >>>>>>>> +	vfio_group_register_notifier(group, nb);
> >>>>>>>> +
> >>>>>>>>  	up_read(&container->group_lock);
> >>>>>>>>  	vfio_group_try_dissolve_container(group);
> >>>>>>>>  
> >>>>>>>> @@ -2102,6 +2104,8 @@ int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
> >>>>>>>>  	else
> >>>>>>>>  		ret = -ENOTTY;
> >>>>>>>>  
> >>>>>>>> +	vfio_group_unregister_notifier(group, nb);
> >>>>>>>> +
> >>>>>>>>  	up_read(&container->group_lock);
> >>>>>>>>  	vfio_group_try_dissolve_container(group);
> >>>>>>>>        
> >>>>>>>
> >>>>>>> You haven't addressed the error paths, if the iommu driver returns
> >>>>>>> error and therefore the {un}register returns error, what is the caller
> >>>>>>> to expect about the group registration?
> >>>>>>>       
> >>>>>>
> >>>>>> Will change to:
> >>>>>>
> >>>>>> 	driver = container->iommu_driver;
> >>>>>> 	if (likely(driver && driver->ops->register_notifier))
> >>>>>> 		ret = driver->ops->register_notifier(container->iommu_data, nb);
> >>>>>> 	else
> >>>>>> 		ret = -ENOTTY;
> >>>>>> 	if (ret)
> >>>>>> 		goto err_register_iommu;
> >>>>>>
> >>>>>> 	ret = vfio_group_register_notifier(group, nb);
> >>>>>> 	if (ret)
> >>>>>> 		driver->ops->unregister_notifier(container->iommu_data, nb);
> >>>>>>
> >>>>>> err_register_iommu:
> >>>>>> 	up_read(&container->group_lock);
> >>>>>> 	vfio_group_try_dissolve_container(group);
> >>>>>>
> >>>>>> err_register_nb:
> >>>>>> 	vfio_group_put(group);
> >>>>>> 	return ret;    
> >>>>>
> >>>>>
> >>>>>
> >>>>> What if a vendor driver only cares about the kvm state and doesn't pin
> >>>>> memory (ie. no DMA) or only cares about iommu and not group notifies?
> >>>>> If we handled notifier_fn_t 'action' as a bitmask then we could have the
> >>>>> registrar specify which notification they wanted (a mask/filter), so if
> >>>>> they only want KVM, we only send that notify, if they only want UNMAPs,
> >>>>> etc. Then we know whether iommu registration is required.  As a bonus,
> >>>>> we could add a pr_info() indicating vendors that ask for KVM
> >>>>> notification so that we can interrogate why they think they need it.
> >>>>> The downside is that handling action as a bitmask means that we limit
> >>>>> the number of actions we have available (32 or 64 bits worth).  That
> >>>>> limit is hopefully far enough off to be ok though.  Thoughts?  Thanks,
> >>>>>    
> >>>>
> >>>> As per my understanding, this bitmask is input to
> >>>> vfio_register_notifier() and vfio_unregister_notifier(), right?
> >>>>
> >>>> These functions are not called from vendor driver directly, these are
> >>>> called from vfio_mdev. Then should this bitmask be part of parent_ops
> >>>> that vendor driver can specify?    
> >>>
> >>> I think so, there should be a 'notifiler_filter' in parent_ops to indicate
> >>> that. A draft patch to show Alex's proposal:
> >>>     
> >>
> >> In that case how can we use bitmask to register multiple actions from
> >> backend iommu module (now we only have VFIO_IOMMU_NOTIFY_DMA_UNMAP
> >> action)? Even if we pass bitmask to backend iommu modules
> >> register_notifier(), backend module would have to create separate
> >> notifier for each action?
> >>
> >> Instead of taking bitmask as input from vendor driver, vendor driver's
> >> callback can send return depending on action if they don't want to get
> >> future notification:
> >>
> >> #define NOTIFY_DONE             0x0000          /* Don't care */
> >> #define NOTIFY_OK               0x0001          /* Suits me */
> >> #define NOTIFY_STOP_MASK        0x8000          /* Don't call further */
> >> #define NOTIFY_BAD              (NOTIFY_STOP_MASK|0x0002)
> >>                                                 /* Bad/Veto action */
> >> /*
> >>  * Clean way to return from the notifier and stop further calls.
> >>  */
> >> #define NOTIFY_STOP             (NOTIFY_OK|NOTIFY_STOP_MASK)  
> > 
> > I don't think that's a correct interpretation of NOTIFY_STOP_MASK, it
> > intends to say "this notification was for me, I've handled it, no need
> > to continue through the call chain".  It does not imply "don't tell me
> > about this event in the future", entries in the call chain don't have
> > the ability to suppress certain events.
> >   
> 
> OK, thanks for correcting me.
> 
> > Also note that NOTIFY_STOP_MASK is returned by
> > blocking_notifier_call_chain(), so we can pr_warn if a call chain
> > member is potentially preventing other notify-ees from receiving the
> > event.
> > 
> > Should we be revisiting the question of whether notifiers are registered
> > as part of the mdev-core?  In the current proposals
> > vfio_{un}register_notifier() is exported, so it's really just a
> > convenience that the mdev core registers it on behalf of the vendor
> > driver.  Vendor drivers could register on open and unregister on
> > release (vfio could additionally do housekeeping on release).  We
> > already expect vendor drivers to call vfio_{un}pin_pages() directly.
> > Then the vendor driver would provide filter flags directly and we'd
> > make that part of the mdev-vendor driver API a little more flexible.  
> 
> Ok. But that still doesn't solve the problem if more actions need to be
> added to backend iommu module.
> 
> If vendor driver notifier callback is called, vendor driver can return
> NOTIFY_DONE or NOTIFY_OK for the action they are not interested in.

Perhaps calling it a filter is not correct, I was thinking that a
vendor driver would register the notifier with a set of required
events.  The driver would need to handle/ignore additional events
outside of the required mask.  There are certainly some complications
to this model though that I hadn't thought all the way through until
now.  For instance what if we add event XYZ in the future and the
vendor driver adds this to their required mask.  If we run that on an
older kernel where the vfio infrastructure doesn't know about that
event, the vendor driver needs to fail, or at least know that event is
not supported and retry with a set of supported events.

There's another problem with my proposal too, we can't put a single
notifier_block on multiple notifier_block heads, that just doesn't
work.  So we probably need to separate a group notifier from an iommu
notifier, the vendor driver will need to register for each.

Maybe we end up with something like:

int vfio_register_notifier(struct device *dev,
			   vfio_notify_type_t type,
			   unsigned long *required_events,
			   struct notifier_block *nb);

typedef unsigned short vfio_notify_type_t;
enum vfio_notify_type {
	VFIO_IOMMU_NOTIFY = (__force vfio_notify_type_t)0,
	VFIO_GROUP_NOTIFY = (__force vfio_notify_type_t)1,
};

(stealing this from pci_dev_flags_t, hope it works)

A VFIO_GROUP_NOTIFY would add the notifier_block to the vfio_group, a
VFIO_IOMMU_NOTIFY would add it to vfio_iommu.  Each would have their
own unique set of events and each would compare their supported events
to the required events by the caller.  Supported events would be
cleared from the callers required_events arg.  If required_events still
has bits set, the notifier_block is not registered, an error is
returned, and the caller can identify the unsupported events by the
remaining bits in the required_events arg.  Can that work?  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html