Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/01/2015 15:19, Yann Droneaud wrote:
> Hi,
> 
> Le mardi 27 janvier 2015 à 08:50 +0200, Haggai Eran a écrit :
>> On 26/01/2015 13:17, Yann Droneaud wrote:
>>> ...
>>> Le dimanche 25 janvier 2015 à 17:23 +0200, Haggai Eran a écrit :
>>>> On 22/01/2015 15:28, Yann Droneaud wrote:
>>>>> This patch ensures the extended QUERY_DEVICE uverbs request's
>>>>> comp_mask has only known values. If userspace returns unknown
>>>>> features, -EINVAL will be returned, allowing to probe/discover
>>>>> which features are currently supported by the kernel.
>>>>
>>>> This probing process will be much more cumbersome than it needs to be
>>>> because userspace will have to call QUERY_DEVICE repetitively with
>>>> different comp_mask fields until it finds out the exact set of supported
>>>> bits.
>>>>
>>>
>>> O(log2(N))
>>
>> I don't think user space developers will be happy having to do trial and 
>> error to determine what features the kernel driver supports. It might be 
>> even more then O(log2(N)). If my understanding of comp_mask bits usage is 
>> correct it would O(N). But it's not the time complexity I'm worried about,
>> it's the fact that it requires user-space developers to go through hoops in
>> order to get information that can be much more easily exported.
>>
> 
> But there's currently *NO* such mean that could give a hint to userspace
> developer whether one bit of request's comp_mask is currently effective
> in extended QUERY_DEVICE uverbs. While we have such mean in CREATE_FLOW
> and DESTROY_FLOW: unsupported bits trigger -EINVAL.
> 
> In QUERY_DEVICE, userspace developer is allowed to set request's 
> comp_mask to -1: it won't hurt it on current kernel, so why not using 
> this value or any other random value ? The program will run: it's "Works
> For Me".
> 
> But the same program (either binary or source code) might fail on
> newer kernel where some bits in comp_mask gain a meaning not supported
> by the program.

Why don't we make the command comp_mask in QUERY_DEVICE zero in both
versions. The behavior of the command with comp_mask = 0 will be to
return the maximum amount of data that fits in the response buffer. The
kernel will return -EINVAL if the input command comp_mask is not zero in
the current version.

If in the future we want to alter the behavior or add more input fields
to QUERY_DEVICE, we would use new bits.

>>> Or you had to add a different interface, dedicated to retrieve the exact
>>> supported feature mask.
>>>
>>>>> Moreover, it also ensure the requested features set in comp_mask
>>>>> are sequentially set, not skipping intermediate features, eg. the
>>>>> "highest" requested feature also request all the "lower" ones.
>>>>> This way each new feature will have to be stacked on top of the
>>>>> existing ones: this is especially important for the request and
>>>>> response data structures where fields are added after the
>>>>> current ones when expanded to support a new feature.
>>>>
>>>> I think it is perfectly acceptable that not all drivers will implement
>>>> all available features, and so you shouldn't enforce this requirement.
>>>
>>> With regard to QUERY_DEVICE: the data structure layout depends on the
>>> comp_mask value. So either you propose a way to express multipart data
>>> structure (see CMSG or NETLINK), or we have to ensure the data structure
>>> is ever-growing, with each new chunck stacked over the existing ones:
>>> that's the purpose of :
>>>
>>> 	if (cmd.comp_mask & (cmd.comp_mask + 1))
>>> 		return -EINVAL;
>>>
>>>> Also, it makes the comp_mask nothing more than a wasteful version number
>>>> between 0 and 63.
>>>
>>> That's what I've already explained earlier in "Re: [PATCH v3 06/17]
>>> IB/core: Add support for extended query device caps":
>>>
>>> http://mid.gmane.org/1421844612.13543.40.camel@xxxxxxxxxx
>>
>> Yes, you wrote there:
>>> Regarding comp_mask (not for this particular verb):
>>>
>>> It's not clear whether request's comp_mask describe the request or the
>>> response, as such I'm puzzled.
>>>
>>> How would the kernel and the userspace be able to parse the request and
>>> the response if they ignore unknown bits ?
>>>
>>> How would they be able to skip the unrecognized chunk of the request and
>>> response buffer ?
>>>
>>> How one bit in a comp_mask is related to a chunk in the request or
>>> response ?
>>>
>>> It's likely the kernel or userspace would have to skip the remaining
>>> comp_mask's bits after encountering an unknown bit as the size of the
>>> corresponding chunk in request or response would be unknown, making
>>> impossible to locate the corresponding chunk for the next bit set in
>>> comp_mask. Having said that, comp_mask is just a complicated way of
>>> expressing a version, which is turn describe a size (ever growing).
>>
>> It is my understanding that each comp_mask bit marks a set of fields in 
>> the command or in the response struct as valid, so the struct format 
>> remains the same and the kernel and userspace don't need to make 
>> difficult calculation as to where each field is, but you can still pass 
>> a high bit set in comp_mask with one of the lower bits cleared.
>>
> 
> How can the struct format remain the same, if some fields are added
> to implement newer feature ?

I meant that the binary format for an older version is the prefix of the
binary format of the newer version.

>> I couldn't find this explicit detail in the mailing list, but I did found
>> a presentation that was presented in OFA International Developer 
>> Workshop 2013 [1], that gave an example of of an verb where each 
>> comp_mask bit marked a different field as valid.
>>
> 
> Thanks for the link to the presentation.
> 
> As I read it the presentation:
> 
> - in request, comp_mask give hint to the kernel of additional fields in
> the request.
> 
> - in response, comp_mask give hint to userspace regarding the presence
> of additional fields in the response.

I'm not sure that in request you can regard the comp_mask as a hint. I
agree that you need to enforce it in general and reject unknown bits
there (although I thought QUERY_DEVICE would be an exception).

> But commit 860f10a799c8 ("IB/core: Add flags for on demand paging
> support") on top of commit 5a77abf9a97a ("IB/core: Add support for
> extended query device caps") is not doing it the expected way
> as one bit set in request's comp_mask has direct effect on
> the response's fields.
> 
> To be conformant with the "Extending Verbs API" presentation,
> no check should be done on request's comp_mask, and on-demand paging
> information should be returned to userspace in all case, provided 
> there's enough room in the response buffer.
> 
> Anyway, allowing userspace to set any "hint" in the request's comp_mask
> is opening a pandora box: being allowed to set any value, userspace
> program will use any random value, as it will work with current kernel.
> 
> But the same program used on newer kernel is going to trigger some
> behavior unknown to the application or return an error the application
> is not ready to handle ... breaking any kind of forward compatibility 
> promise.

I don't think that the application should handle unknown response bits
as an error. As you wrote, I see these as hints about more data in the
response that the application is free to ignore.

> It's likely the kernel will have to use the size of the request to 
> guess the hints in comp_mask are corrects to handle such. In such case, 
> relying only on the size of the request might be enough to detect 
> extended request, without the need for comp_mask.
> 
> Regarding the answer, if the response buffer is smaller than the size
> of the extended response, will the kernel keep setting the response's 
> comp_mask with all the bits it supports or will it unset the bits for 
> the fields it wasn't able to fit in the response buffer ?
> 
> It's likely the kernel will have to use the size of the response buffer 
> to set the response's comp_mask. 
> 
> Instead commit 860f10a799c8 ("IB/core: Add flags for on demand paging
> support") on top of commit 5a77abf9a97a ("IB/core: Add support for
> extended query device caps") make it possible for the kernel to return
> truncated response with full comp_mask. Such is going to break silently 
> when one will introduce a data structure with an ABI mismatch between
> userspace and kernel (for example x86 vs amd64 ... we have some recent
> exemples).

As I said, I think unknown bits in the comp_mask are hints about unknown
fields in the response that can be ignored by the application. However,
I can agree to having the kernel checking the response buffer size as
you wrote, and only setting the valid comp_mask bits.

> 
>>>
>>>>
>>>> In the specific case of QUERY_DEVICE you might argue that there isn't
>>>> any need for input comp_mask, only for output, and then you may enforce
>>>> the input comp_mask will always be zero.
>>>
>>> The extended QUERY_DEVICE uverbs as currently merged is using comp_mask
>>> from input to choose to report on-demand-paging related value. So it
>>> seems it's needed.
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_cmd.c?id=v3.19-rc6#n3297
>>>
>>>> However, you will in any case need to be able to extended the size of the response in the future.
>>>>
>>>
>>> That's already the case for on demand paging.
>>>
>>>>>
>>>>> Link: http://mid.gmane.org/cover.1421931555.git.ydroneaud@xxxxxxxxxx
>>>>> Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx>
>>>>> Cc: Shachar Raindel <raindel@xxxxxxxxxxxx>
>>>>> Cc: Eli Cohen <eli@xxxxxxxxxxxx>
>>>>> Cc: Haggai Eran <haggaie@xxxxxxxxxxxx>
>>>>> Signed-off-by: Yann Droneaud <ydroneaud@xxxxxxxxxx>
>>>>> ---
>>>>>  drivers/infiniband/core/uverbs_cmd.c | 6 ++++++
>>>>>  1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
>>>>> index 8668b328b7e6..80a1c90f1dbf 100644
>>>>> --- a/drivers/infiniband/core/uverbs_cmd.c
>>>>> +++ b/drivers/infiniband/core/uverbs_cmd.c
>>>>> @@ -3313,6 +3313,12 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
>>>>>  	if (err)
>>>>>  		return err;
>>>>>  
>>>>> +	if (cmd.comp_mask & (cmd.comp_mask + 1))
>>>>> +		return -EINVAL;
>>>>> +
>>>>> +	if (cmd.comp_mask & ~(__u32)IB_USER_VERBS_EX_QUERY_DEVICE_ODP)
>>>>> +		return -EINVAL;
>>>>> +
>>>
>>> If we keep the checks on output buffer size from patch 1, returning
>>> -ENOSPC in case of size mismatch, and if we are sure that no bit in
>>> input comp_mask will ever trigger a call to a kernel function that can
>>> make the uverb fail, the latter check on known value could be dropped,
>>> allowing the userspace to set the highest mask (-1): userspace
>>> will use -ENOSPC to probe the expected size of the response buffer
>>> to match the highest supported comp_mask. But it's going to hurt
>>> userspace application not ready to receive -ENOSPC on newer kernel
>>> with extended QUERY_DEVICE ABI ... Oops.
>>>
>>> So in this end, the safest way to ensure userspace is doing the correct
>>> thing so that we have backward and forward compatibility is to check
>>> for known values in comp_mask, check for response buffer size and ensure
>>> that data structure chunk are stacked.
>>>
>>> The tighter are the checks now, the easier the interface could be
>>> extended latter.
>>
>> I understand this position, and I generally agree, but I think that 
>> specifically for a verb like QUERY_DEVICE that only reads information from 
>> the kernel driver to user-space, there is no harm in the kernel just 
>> providing all the information it can fit in the response buffer provided 
>> by user-space.
>>
> 
> Returning as much as possible information to userspace is OK,
> but it's doing it the wrong way.
> 
> I'm not specifically discussing QUERY_DEVICE, as I'm concerned with 
> every extended uverbs to be added to the kernel.
> 
>> Let me explain: newer fields are added to the kernel response struct at the 
>> end, together with a new comp_mask bit.
>>
>> Older user-space with newer kernels will simply ask only for the buffer 
>> size they care about. The fact that the struct is truncated doesn't affect
>> these programs because the truncated struct is a valid struct that was 
>> presented by the kernel in an older version.
>>
> 
> You cannot ensure the userspace being correct when specifying a request.
> It's a wrong assumption (perhaps not so wrong in the case of
> InfiniBand/RDMA, as most userspace program are using libibverbs,
> librdmacm and provider's libraries).
> 
> That's why we must not be liberal with the request and check any bit of
> it for being something valid, so that erroneous values are catch now,
> ensuring userspace is not trying to request things we don't know yet
> and it's not aware of it too.

Does the solution I proposed above satisfy this requirement?
- The kernel validates input size == command struct size and
cmd.comp_mask == 0.
- The kernel fills as much information as it can fit in the buffer
provided by userspace.
- The kernel marks which fields it was able to fill in the response's
comp_mask.

Regards,
Haggai


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux