Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Le mercredi 28 janvier 2015 à 17:40 +0200, Haggai Eran a écrit :
> On 28/01/2015 15:19, Yann Droneaud wrote:
> > Le mardi 27 janvier 2015 à 08:50 +0200, Haggai Eran a écrit :
> >> On 26/01/2015 13:17, Yann Droneaud wrote:
> >>> ...
> >>> Le dimanche 25 janvier 2015 à 17:23 +0200, Haggai Eran a écrit :
> >>>> On 22/01/2015 15:28, Yann Droneaud wrote:
> >>>>> This patch ensures the extended QUERY_DEVICE uverbs request's
> >>>>> comp_mask has only known values. If userspace returns unknown
> >>>>> features, -EINVAL will be returned, allowing to probe/discover
> >>>>> which features are currently supported by the kernel.
> >>>>
> >>>> This probing process will be much more cumbersome than it needs to be
> >>>> because userspace will have to call QUERY_DEVICE repetitively with
> >>>> different comp_mask fields until it finds out the exact set of supported
> >>>> bits.
> >>>>
> >>>
> >>> O(log2(N))
> >>
> >> I don't think user space developers will be happy having to do trial and 
> >> error to determine what features the kernel driver supports. It might be 
> >> even more then O(log2(N)). If my understanding of comp_mask bits usage is 
> >> correct it would O(N). But it's not the time complexity I'm worried about,
> >> it's the fact that it requires user-space developers to go through hoops in
> >> order to get information that can be much more easily exported.
> >>
> > 
> > But there's currently *NO* such mean that could give a hint to userspace
> > developer whether one bit of request's comp_mask is currently effective
> > in extended QUERY_DEVICE uverbs. While we have such mean in CREATE_FLOW
> > and DESTROY_FLOW: unsupported bits trigger -EINVAL.
> > 
> > In QUERY_DEVICE, userspace developer is allowed to set request's 
> > comp_mask to -1: it won't hurt it on current kernel, so why not using 
> > this value or any other random value ? The program will run: it's "Works
> > For Me".
> > 
> > But the same program (either binary or source code) might fail on
> > newer kernel where some bits in comp_mask gain a meaning not supported
> > by the program.
> 
> Why don't we make the command comp_mask in QUERY_DEVICE zero in both
> versions. The behavior of the command with comp_mask = 0 will be to
> return the maximum amount of data that fits in the response buffer. The
> kernel will return -EINVAL if the input command comp_mask is not zero in
> the current version.
> 

Yes, that's exactly what I wanted to do.

> If in the future we want to alter the behavior or add more input fields
> to QUERY_DEVICE, we would use new bits.
> 

Yes.

> >>> Or you had to add a different interface, dedicated to retrieve the exact
> >>> supported feature mask.
> >>>
> >>>>> Moreover, it also ensure the requested features set in comp_mask
> >>>>> are sequentially set, not skipping intermediate features, eg. the
> >>>>> "highest" requested feature also request all the "lower" ones.
> >>>>> This way each new feature will have to be stacked on top of the
> >>>>> existing ones: this is especially important for the request and
> >>>>> response data structures where fields are added after the
> >>>>> current ones when expanded to support a new feature.
> >>>>
> >>>> I think it is perfectly acceptable that not all drivers will implement
> >>>> all available features, and so you shouldn't enforce this requirement.
> >>>
> >>> With regard to QUERY_DEVICE: the data structure layout depends on the
> >>> comp_mask value. So either you propose a way to express multipart data
> >>> structure (see CMSG or NETLINK), or we have to ensure the data structure
> >>> is ever-growing, with each new chunck stacked over the existing ones:
> >>> that's the purpose of :
> >>>
> >>> 	if (cmd.comp_mask & (cmd.comp_mask + 1))
> >>> 		return -EINVAL;
> >>>
> >>>> Also, it makes the comp_mask nothing more than a wasteful version number
> >>>> between 0 and 63.
> >>>
> >>> That's what I've already explained earlier in "Re: [PATCH v3 06/17]
> >>> IB/core: Add support for extended query device caps":
> >>>
> >>> http://mid.gmane.org/1421844612.13543.40.camel@xxxxxxxxxx
> >>
> >> Yes, you wrote there:
> >>> Regarding comp_mask (not for this particular verb):
> >>>
> >>> It's not clear whether request's comp_mask describe the request or the
> >>> response, as such I'm puzzled.
> >>>
> >>> How would the kernel and the userspace be able to parse the request and
> >>> the response if they ignore unknown bits ?
> >>>
> >>> How would they be able to skip the unrecognized chunk of the request and
> >>> response buffer ?
> >>>
> >>> How one bit in a comp_mask is related to a chunk in the request or
> >>> response ?
> >>>
> >>> It's likely the kernel or userspace would have to skip the remaining
> >>> comp_mask's bits after encountering an unknown bit as the size of the
> >>> corresponding chunk in request or response would be unknown, making
> >>> impossible to locate the corresponding chunk for the next bit set in
> >>> comp_mask. Having said that, comp_mask is just a complicated way of
> >>> expressing a version, which is turn describe a size (ever growing).
> >>
> >> It is my understanding that each comp_mask bit marks a set of fields in 
> >> the command or in the response struct as valid, so the struct format 
> >> remains the same and the kernel and userspace don't need to make 
> >> difficult calculation as to where each field is, but you can still pass 
> >> a high bit set in comp_mask with one of the lower bits cleared.
> >>
> > 
> > How can the struct format remain the same, if some fields are added
> > to implement newer feature ?
> 
> I meant that the binary format for an older version is the prefix of the
> binary format of the newer version.
> 

OK.

> >> I couldn't find this explicit detail in the mailing list, but I did found
> >> a presentation that was presented in OFA International Developer 
> >> Workshop 2013 [1], that gave an example of of an verb where each 
> >> comp_mask bit marked a different field as valid.
> >>
> > 
> > Thanks for the link to the presentation.
> > 
> > As I read it the presentation:
> > 
> > - in request, comp_mask give hint to the kernel of additional fields in
> > the request.
> > 
> > - in response, comp_mask give hint to userspace regarding the presence
> > of additional fields in the response.
> 
> I'm not sure that in request you can regard the comp_mask as a hint.

It's a hint because the kernel also have to check the size match:
if userspace say: "hey I've given you more fields" but the request
size is shorter than the kernel expect for such fields, the kernel
must return an error so that userspace is taught to fix its requests.

> I agree that you need to enforce it in general and reject unknown bits
> there (although I thought QUERY_DEVICE would be an exception).
> 

I think it's better to stick to one common behavior so that every
extended uverbs behave the same way.

> > But commit 860f10a799c8 ("IB/core: Add flags for on demand paging
> > support") on top of commit 5a77abf9a97a ("IB/core: Add support for
> > extended query device caps") is not doing it the expected way
> > as one bit set in request's comp_mask has direct effect on
> > the response's fields.
> > 
> > To be conformant with the "Extending Verbs API" presentation,
> > no check should be done on request's comp_mask, and on-demand paging
> > information should be returned to userspace in all case, provided 
> > there's enough room in the response buffer.
> > 
> > Anyway, allowing userspace to set any "hint" in the request's comp_mask
> > is opening a pandora box: being allowed to set any value, userspace
> > program will use any random value, as it will work with current kernel.
> > 
> > But the same program used on newer kernel is going to trigger some
> > behavior unknown to the application or return an error the application
> > is not ready to handle ... breaking any kind of forward compatibility 
> > promise.
> 
> I don't think that the application should handle unknown response bits
> as an error. As you wrote, I see these as hints about more data in the
> response that the application is free to ignore.
> 

I tried to explain the issue regarding older userspace setting random
bits in request's comp_mask: it older kernel allows such, the same
program will likely have issue with newer kernel where the request's
comp_mask bits have a meaning now: the program will either face
unknown behavior: the kernel does something the userspace was not
expecting, or the kernel refuse to do it because there's not enough
information in the request to fullfil it.

That's why we must catch now unknown value to prevent userspace to
use it now. If unknown value are not allowed, userspace program won't
use it and then it could run asis on newer kernel.


> > It's likely the kernel will have to use the size of the request to 
> > guess the hints in comp_mask are corrects to handle such. In such case, 
> > relying only on the size of the request might be enough to detect 
> > extended request, without the need for comp_mask.
> > 
> > Regarding the answer, if the response buffer is smaller than the size
> > of the extended response, will the kernel keep setting the response's 
> > comp_mask with all the bits it supports or will it unset the bits for 
> > the fields it wasn't able to fit in the response buffer ?
> > 
> > It's likely the kernel will have to use the size of the response buffer 
> > to set the response's comp_mask. 
> > 
> > Instead commit 860f10a799c8 ("IB/core: Add flags for on demand paging
> > support") on top of commit 5a77abf9a97a ("IB/core: Add support for
> > extended query device caps") make it possible for the kernel to return
> > truncated response with full comp_mask. Such is going to break silently 
> > when one will introduce a data structure with an ABI mismatch between
> > userspace and kernel (for example x86 vs amd64 ... we have some recent
> > exemples).
> 
> As I said, I think unknown bits in the comp_mask are hints about unknown
> fields in the response that can be ignored by the application. However,
> I can agree to having the kernel checking the response buffer size as
> you wrote, and only setting the valid comp_mask bits.
> 

OK.

> > 
> >>>
> >>>>
> >>>> In the specific case of QUERY_DEVICE you might argue that there isn't
> >>>> any need for input comp_mask, only for output, and then you may enforce
> >>>> the input comp_mask will always be zero.
> >>>
> >>> The extended QUERY_DEVICE uverbs as currently merged is using comp_mask
> >>> from input to choose to report on-demand-paging related value. So it
> >>> seems it's needed.
> >>>
> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_cmd.c?id=v3.19-rc6#n3297
> >>>
> >>>> However, you will in any case need to be able to extended the size of the response in the future.
> >>>>
> >>>
> >>> That's already the case for on demand paging.
> >>>
> >>>>>
> >>>>> Link: http://mid.gmane.org/cover.1421931555.git.ydroneaud@xxxxxxxxxx
> >>>>> Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx>
> >>>>> Cc: Shachar Raindel <raindel@xxxxxxxxxxxx>
> >>>>> Cc: Eli Cohen <eli@xxxxxxxxxxxx>
> >>>>> Cc: Haggai Eran <haggaie@xxxxxxxxxxxx>
> >>>>> Signed-off-by: Yann Droneaud <ydroneaud@xxxxxxxxxx>
> >>>>> ---
> >>>>>  drivers/infiniband/core/uverbs_cmd.c | 6 ++++++
> >>>>>  1 file changed, 6 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> >>>>> index 8668b328b7e6..80a1c90f1dbf 100644
> >>>>> --- a/drivers/infiniband/core/uverbs_cmd.c
> >>>>> +++ b/drivers/infiniband/core/uverbs_cmd.c
> >>>>> @@ -3313,6 +3313,12 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
> >>>>>  	if (err)
> >>>>>  		return err;
> >>>>>  
> >>>>> +	if (cmd.comp_mask & (cmd.comp_mask + 1))
> >>>>> +		return -EINVAL;
> >>>>> +
> >>>>> +	if (cmd.comp_mask & ~(__u32)IB_USER_VERBS_EX_QUERY_DEVICE_ODP)
> >>>>> +		return -EINVAL;
> >>>>> +
> >>>
> >>> If we keep the checks on output buffer size from patch 1, returning
> >>> -ENOSPC in case of size mismatch, and if we are sure that no bit in
> >>> input comp_mask will ever trigger a call to a kernel function that can
> >>> make the uverb fail, the latter check on known value could be dropped,
> >>> allowing the userspace to set the highest mask (-1): userspace
> >>> will use -ENOSPC to probe the expected size of the response buffer
> >>> to match the highest supported comp_mask. But it's going to hurt
> >>> userspace application not ready to receive -ENOSPC on newer kernel
> >>> with extended QUERY_DEVICE ABI ... Oops.
> >>>
> >>> So in this end, the safest way to ensure userspace is doing the correct
> >>> thing so that we have backward and forward compatibility is to check
> >>> for known values in comp_mask, check for response buffer size and ensure
> >>> that data structure chunk are stacked.
> >>>
> >>> The tighter are the checks now, the easier the interface could be
> >>> extended latter.
> >>
> >> I understand this position, and I generally agree, but I think that 
> >> specifically for a verb like QUERY_DEVICE that only reads information from 
> >> the kernel driver to user-space, there is no harm in the kernel just 
> >> providing all the information it can fit in the response buffer provided 
> >> by user-space.
> >>
> > 
> > Returning as much as possible information to userspace is OK,
> > but it's doing it the wrong way.
> > 
> > I'm not specifically discussing QUERY_DEVICE, as I'm concerned with 
> > every extended uverbs to be added to the kernel.
> > 
> >> Let me explain: newer fields are added to the kernel response struct at the 
> >> end, together with a new comp_mask bit.
> >>
> >> Older user-space with newer kernels will simply ask only for the buffer 
> >> size they care about. The fact that the struct is truncated doesn't affect
> >> these programs because the truncated struct is a valid struct that was 
> >> presented by the kernel in an older version.
> >>
> > 
> > You cannot ensure the userspace being correct when specifying a request.
> > It's a wrong assumption (perhaps not so wrong in the case of
> > InfiniBand/RDMA, as most userspace program are using libibverbs,
> > librdmacm and provider's libraries).
> > 
> > That's why we must not be liberal with the request and check any bit of
> > it for being something valid, so that erroneous values are catch now,
> > ensuring userspace is not trying to request things we don't know yet
> > and it's not aware of it too.
> 
> Does the solution I proposed above satisfy this requirement?
> - The kernel validates input size == command struct size and
> cmd.comp_mask == 0.
> - The kernel fills as much information as it can fit in the buffer
> provided by userspace.
> - The kernel marks which fields it was able to fill in the response's
> comp_mask.
> 

Yes. This scheme follow the "Extending Verbs API" presention
while ensuring the command can be extended regarding to good
practice one can find in
http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

I'm going to submit the patches to have this behavior.

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux