Re: [v4 2/3] vfio_register_notifier: also register on the group notifier

Jike Song <jike.song@xxxxxxxxx> · Thu, 17 Nov 2016 13:24:59 +0800

On 11/17/2016 03:45 AM, Alex Williamson wrote:
> On Thu, 17 Nov 2016 00:42:48 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>> On 11/16/2016 11:18 PM, Alex Williamson wrote:
>>> On Wed, 16 Nov 2016 16:14:24 +0530
>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>>>> On 11/16/2016 3:07 PM, Jike Song wrote:  
>>>>> On 11/16/2016 05:14 PM, Kirti Wankhede wrote:    
>>>>>> On 11/16/2016 9:13 AM, Alex Williamson wrote:    
>>>>>>> On Wed, 16 Nov 2016 11:01:37 +0800
>>>>>>> Jike Song <jike.song@xxxxxxxxx> wrote:
>>>>>>>> On 11/16/2016 07:11 AM, Alex Williamson wrote:    
>>>>>>>>> On Tue, 15 Nov 2016 19:35:46 +0800
>>>>>>>>> Jike Song <jike.song@xxxxxxxxx> wrote:
>>>>>>>>>       
>>>>>>>>>> The user of vfio_register_notifier might care about not only
>>>>>>>>>> iommu events but also vfio_group events, so also register the
>>>>>>>>>> notifier_block on vfio_group.
>>>>>>>>>>
>>>>>>>>>> Cc: Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>
>>>>>>>>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>>>>>>>>> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
>>>>>>>>>> Signed-off-by: Jike Song <jike.song@xxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/vfio/vfio.c | 4 ++++
>>>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>>>>>>>>>> index b149ced..2c0eedb 100644
>>>>>>>>>> --- a/drivers/vfio/vfio.c
>>>>>>>>>> +++ b/drivers/vfio/vfio.c
>>>>>>>>>> @@ -2065,6 +2065,8 @@ int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
>>>>>>>>>>  	else
>>>>>>>>>>  		ret = -ENOTTY;
>>>>>>>>>>  
>>>>>>>>>> +	vfio_group_register_notifier(group, nb);
>>>>>>>>>> +
>>>>>>>>>>  	up_read(&container->group_lock);
>>>>>>>>>>  	vfio_group_try_dissolve_container(group);
>>>>>>>>>>  
>>>>>>>>>> @@ -2102,6 +2104,8 @@ int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
>>>>>>>>>>  	else
>>>>>>>>>>  		ret = -ENOTTY;
>>>>>>>>>>  
>>>>>>>>>> +	vfio_group_unregister_notifier(group, nb);
>>>>>>>>>> +
>>>>>>>>>>  	up_read(&container->group_lock);
>>>>>>>>>>  	vfio_group_try_dissolve_container(group);
>>>>>>>>>>        
>>>>>>>>>
>>>>>>>>> You haven't addressed the error paths, if the iommu driver returns
>>>>>>>>> error and therefore the {un}register returns error, what is the caller
>>>>>>>>> to expect about the group registration?
>>>>>>>>>       
>>>>>>>>
>>>>>>>> Will change to:
>>>>>>>>
>>>>>>>> 	driver = container->iommu_driver;
>>>>>>>> 	if (likely(driver && driver->ops->register_notifier))
>>>>>>>> 		ret = driver->ops->register_notifier(container->iommu_data, nb);
>>>>>>>> 	else
>>>>>>>> 		ret = -ENOTTY;
>>>>>>>> 	if (ret)
>>>>>>>> 		goto err_register_iommu;
>>>>>>>>
>>>>>>>> 	ret = vfio_group_register_notifier(group, nb);
>>>>>>>> 	if (ret)
>>>>>>>> 		driver->ops->unregister_notifier(container->iommu_data, nb);
>>>>>>>>
>>>>>>>> err_register_iommu:
>>>>>>>> 	up_read(&container->group_lock);
>>>>>>>> 	vfio_group_try_dissolve_container(group);
>>>>>>>>
>>>>>>>> err_register_nb:
>>>>>>>> 	vfio_group_put(group);
>>>>>>>> 	return ret;    
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What if a vendor driver only cares about the kvm state and doesn't pin
>>>>>>> memory (ie. no DMA) or only cares about iommu and not group notifies?
>>>>>>> If we handled notifier_fn_t 'action' as a bitmask then we could have the
>>>>>>> registrar specify which notification they wanted (a mask/filter), so if
>>>>>>> they only want KVM, we only send that notify, if they only want UNMAPs,
>>>>>>> etc. Then we know whether iommu registration is required.  As a bonus,
>>>>>>> we could add a pr_info() indicating vendors that ask for KVM
>>>>>>> notification so that we can interrogate why they think they need it.
>>>>>>> The downside is that handling action as a bitmask means that we limit
>>>>>>> the number of actions we have available (32 or 64 bits worth).  That
>>>>>>> limit is hopefully far enough off to be ok though.  Thoughts?  Thanks,
>>>>>>>    
>>>>>>
>>>>>> As per my understanding, this bitmask is input to
>>>>>> vfio_register_notifier() and vfio_unregister_notifier(), right?
>>>>>>
>>>>>> These functions are not called from vendor driver directly, these are
>>>>>> called from vfio_mdev. Then should this bitmask be part of parent_ops
>>>>>> that vendor driver can specify?    
>>>>>
>>>>> I think so, there should be a 'notifiler_filter' in parent_ops to indicate
>>>>> that. A draft patch to show Alex's proposal:
>>>>>     
>>>>
>>>> In that case how can we use bitmask to register multiple actions from
>>>> backend iommu module (now we only have VFIO_IOMMU_NOTIFY_DMA_UNMAP
>>>> action)? Even if we pass bitmask to backend iommu modules
>>>> register_notifier(), backend module would have to create separate
>>>> notifier for each action?
>>>>
>>>> Instead of taking bitmask as input from vendor driver, vendor driver's
>>>> callback can send return depending on action if they don't want to get
>>>> future notification:
>>>>
>>>> #define NOTIFY_DONE             0x0000          /* Don't care */
>>>> #define NOTIFY_OK               0x0001          /* Suits me */
>>>> #define NOTIFY_STOP_MASK        0x8000          /* Don't call further */
>>>> #define NOTIFY_BAD              (NOTIFY_STOP_MASK|0x0002)
>>>>                                                 /* Bad/Veto action */
>>>> /*
>>>>  * Clean way to return from the notifier and stop further calls.
>>>>  */
>>>> #define NOTIFY_STOP             (NOTIFY_OK|NOTIFY_STOP_MASK)  
>>>
>>> I don't think that's a correct interpretation of NOTIFY_STOP_MASK, it
>>> intends to say "this notification was for me, I've handled it, no need
>>> to continue through the call chain".  It does not imply "don't tell me
>>> about this event in the future", entries in the call chain don't have
>>> the ability to suppress certain events.
>>>   
>>
>> OK, thanks for correcting me.
>>
>>> Also note that NOTIFY_STOP_MASK is returned by
>>> blocking_notifier_call_chain(), so we can pr_warn if a call chain
>>> member is potentially preventing other notify-ees from receiving the
>>> event.
>>>
>>> Should we be revisiting the question of whether notifiers are registered
>>> as part of the mdev-core?  In the current proposals
>>> vfio_{un}register_notifier() is exported, so it's really just a
>>> convenience that the mdev core registers it on behalf of the vendor
>>> driver.  Vendor drivers could register on open and unregister on
>>> release (vfio could additionally do housekeeping on release).  We
>>> already expect vendor drivers to call vfio_{un}pin_pages() directly.
>>> Then the vendor driver would provide filter flags directly and we'd
>>> make that part of the mdev-vendor driver API a little more flexible.  
>>
>> Ok. But that still doesn't solve the problem if more actions need to be
>> added to backend iommu module.
>>
>> If vendor driver notifier callback is called, vendor driver can return
>> NOTIFY_DONE or NOTIFY_OK for the action they are not interested in.
> 
> Perhaps calling it a filter is not correct, I was thinking that a
> vendor driver would register the notifier with a set of required
> events.  The driver would need to handle/ignore additional events
> outside of the required mask.  There are certainly some complications
> to this model though that I hadn't thought all the way through until
> now.  For instance what if we add event XYZ in the future and the
> vendor driver adds this to their required mask.  If we run that on an
> older kernel where the vfio infrastructure doesn't know about that
> event, the vendor driver needs to fail, or at least know that event is
> not supported and retry with a set of supported events.
> 
> There's another problem with my proposal too, we can't put a single
> notifier_block on multiple notifier_block heads, that just doesn't
> work.  So we probably need to separate a group notifier from an iommu
> notifier, the vendor driver will need to register for each.
> 
> Maybe we end up with something like:
> 
> int vfio_register_notifier(struct device *dev,
> 			   vfio_notify_type_t type,
> 			   unsigned long *required_events,
> 			   struct notifier_block *nb);
> 
> typedef unsigned short vfio_notify_type_t;
> enum vfio_notify_type {
> 	VFIO_IOMMU_NOTIFY = (__force vfio_notify_type_t)0,
> 	VFIO_GROUP_NOTIFY = (__force vfio_notify_type_t)1,
> };
> 
> (stealing this from pci_dev_flags_t, hope it works)
> 
> A VFIO_GROUP_NOTIFY would add the notifier_block to the vfio_group, a
> VFIO_IOMMU_NOTIFY would add it to vfio_iommu.  Each would have their
> own unique set of events and each would compare their supported events
> to the required events by the caller.  Supported events would be
> cleared from the callers required_events arg.  If required_events still
> has bits set, the notifier_block is not registered, an error is
> returned, and the caller can identify the unsupported events by the
> remaining bits in the required_events arg.  Can that work?  Thanks,

Let me summarize the discussion:

- There should be 2 notifier heads, one in vfio_iommu another in vfio_group;
- vfio_{un}register_notifier() has the type specified in parameter
- In vfio_register_notifier, maybe:

	static vfio_iommu_register_notifier() {..}
	static vfio_group_register_notifier() {..}
	int vfio_register_notififer(type..
	{
		if (type == VFIO_IOMMU_NOTIFY)
			return vfio_iommu_register_notifier();
		if (type == VFIO_GROUP_NOTIFY)
			return vfio_group_register_notifier();
	}

What's more, if we still want registration to be done in mdev framework,
we should change parent_ops:

- rename 'notifier' to 'iommu_notifier'
- add "group_notifier"
- Add a group_events and a iommu_events to indicate what events vendor is
  interested in, respectively

or otherwise don't touch it from mdev framework at all?

--
Thanks,
Jike
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html