On 28/01/2015 15:19, Yann Droneaud wrote: > Hi, > > Le mardi 27 janvier 2015 à 08:50 +0200, Haggai Eran a écrit : >> On 26/01/2015 13:17, Yann Droneaud wrote: >>> ... >>> Le dimanche 25 janvier 2015 à 17:23 +0200, Haggai Eran a écrit : >>>> On 22/01/2015 15:28, Yann Droneaud wrote: >>>>> This patch ensures the extended QUERY_DEVICE uverbs request's >>>>> comp_mask has only known values. If userspace returns unknown >>>>> features, -EINVAL will be returned, allowing to probe/discover >>>>> which features are currently supported by the kernel. >>>> >>>> This probing process will be much more cumbersome than it needs to be >>>> because userspace will have to call QUERY_DEVICE repetitively with >>>> different comp_mask fields until it finds out the exact set of supported >>>> bits. >>>> >>> >>> O(log2(N)) >> >> I don't think user space developers will be happy having to do trial and >> error to determine what features the kernel driver supports. It might be >> even more then O(log2(N)). If my understanding of comp_mask bits usage is >> correct it would O(N). But it's not the time complexity I'm worried about, >> it's the fact that it requires user-space developers to go through hoops in >> order to get information that can be much more easily exported. >> > > But there's currently *NO* such mean that could give a hint to userspace > developer whether one bit of request's comp_mask is currently effective > in extended QUERY_DEVICE uverbs. While we have such mean in CREATE_FLOW > and DESTROY_FLOW: unsupported bits trigger -EINVAL. > > In QUERY_DEVICE, userspace developer is allowed to set request's > comp_mask to -1: it won't hurt it on current kernel, so why not using > this value or any other random value ? The program will run: it's "Works > For Me". > > But the same program (either binary or source code) might fail on > newer kernel where some bits in comp_mask gain a meaning not supported > by the program. Why don't we make the command comp_mask in QUERY_DEVICE zero in both versions. The behavior of the command with comp_mask = 0 will be to return the maximum amount of data that fits in the response buffer. The kernel will return -EINVAL if the input command comp_mask is not zero in the current version. If in the future we want to alter the behavior or add more input fields to QUERY_DEVICE, we would use new bits. >>> Or you had to add a different interface, dedicated to retrieve the exact >>> supported feature mask. >>> >>>>> Moreover, it also ensure the requested features set in comp_mask >>>>> are sequentially set, not skipping intermediate features, eg. the >>>>> "highest" requested feature also request all the "lower" ones. >>>>> This way each new feature will have to be stacked on top of the >>>>> existing ones: this is especially important for the request and >>>>> response data structures where fields are added after the >>>>> current ones when expanded to support a new feature. >>>> >>>> I think it is perfectly acceptable that not all drivers will implement >>>> all available features, and so you shouldn't enforce this requirement. >>> >>> With regard to QUERY_DEVICE: the data structure layout depends on the >>> comp_mask value. So either you propose a way to express multipart data >>> structure (see CMSG or NETLINK), or we have to ensure the data structure >>> is ever-growing, with each new chunck stacked over the existing ones: >>> that's the purpose of : >>> >>> if (cmd.comp_mask & (cmd.comp_mask + 1)) >>> return -EINVAL; >>> >>>> Also, it makes the comp_mask nothing more than a wasteful version number >>>> between 0 and 63. >>> >>> That's what I've already explained earlier in "Re: [PATCH v3 06/17] >>> IB/core: Add support for extended query device caps": >>> >>> http://mid.gmane.org/1421844612.13543.40.camel@xxxxxxxxxx >> >> Yes, you wrote there: >>> Regarding comp_mask (not for this particular verb): >>> >>> It's not clear whether request's comp_mask describe the request or the >>> response, as such I'm puzzled. >>> >>> How would the kernel and the userspace be able to parse the request and >>> the response if they ignore unknown bits ? >>> >>> How would they be able to skip the unrecognized chunk of the request and >>> response buffer ? >>> >>> How one bit in a comp_mask is related to a chunk in the request or >>> response ? >>> >>> It's likely the kernel or userspace would have to skip the remaining >>> comp_mask's bits after encountering an unknown bit as the size of the >>> corresponding chunk in request or response would be unknown, making >>> impossible to locate the corresponding chunk for the next bit set in >>> comp_mask. Having said that, comp_mask is just a complicated way of >>> expressing a version, which is turn describe a size (ever growing). >> >> It is my understanding that each comp_mask bit marks a set of fields in >> the command or in the response struct as valid, so the struct format >> remains the same and the kernel and userspace don't need to make >> difficult calculation as to where each field is, but you can still pass >> a high bit set in comp_mask with one of the lower bits cleared. >> > > How can the struct format remain the same, if some fields are added > to implement newer feature ? I meant that the binary format for an older version is the prefix of the binary format of the newer version. >> I couldn't find this explicit detail in the mailing list, but I did found >> a presentation that was presented in OFA International Developer >> Workshop 2013 [1], that gave an example of of an verb where each >> comp_mask bit marked a different field as valid. >> > > Thanks for the link to the presentation. > > As I read it the presentation: > > - in request, comp_mask give hint to the kernel of additional fields in > the request. > > - in response, comp_mask give hint to userspace regarding the presence > of additional fields in the response. I'm not sure that in request you can regard the comp_mask as a hint. I agree that you need to enforce it in general and reject unknown bits there (although I thought QUERY_DEVICE would be an exception). > But commit 860f10a799c8 ("IB/core: Add flags for on demand paging > support") on top of commit 5a77abf9a97a ("IB/core: Add support for > extended query device caps") is not doing it the expected way > as one bit set in request's comp_mask has direct effect on > the response's fields. > > To be conformant with the "Extending Verbs API" presentation, > no check should be done on request's comp_mask, and on-demand paging > information should be returned to userspace in all case, provided > there's enough room in the response buffer. > > Anyway, allowing userspace to set any "hint" in the request's comp_mask > is opening a pandora box: being allowed to set any value, userspace > program will use any random value, as it will work with current kernel. > > But the same program used on newer kernel is going to trigger some > behavior unknown to the application or return an error the application > is not ready to handle ... breaking any kind of forward compatibility > promise. I don't think that the application should handle unknown response bits as an error. As you wrote, I see these as hints about more data in the response that the application is free to ignore. > It's likely the kernel will have to use the size of the request to > guess the hints in comp_mask are corrects to handle such. In such case, > relying only on the size of the request might be enough to detect > extended request, without the need for comp_mask. > > Regarding the answer, if the response buffer is smaller than the size > of the extended response, will the kernel keep setting the response's > comp_mask with all the bits it supports or will it unset the bits for > the fields it wasn't able to fit in the response buffer ? > > It's likely the kernel will have to use the size of the response buffer > to set the response's comp_mask. > > Instead commit 860f10a799c8 ("IB/core: Add flags for on demand paging > support") on top of commit 5a77abf9a97a ("IB/core: Add support for > extended query device caps") make it possible for the kernel to return > truncated response with full comp_mask. Such is going to break silently > when one will introduce a data structure with an ABI mismatch between > userspace and kernel (for example x86 vs amd64 ... we have some recent > exemples). As I said, I think unknown bits in the comp_mask are hints about unknown fields in the response that can be ignored by the application. However, I can agree to having the kernel checking the response buffer size as you wrote, and only setting the valid comp_mask bits. > >>> >>>> >>>> In the specific case of QUERY_DEVICE you might argue that there isn't >>>> any need for input comp_mask, only for output, and then you may enforce >>>> the input comp_mask will always be zero. >>> >>> The extended QUERY_DEVICE uverbs as currently merged is using comp_mask >>> from input to choose to report on-demand-paging related value. So it >>> seems it's needed. >>> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_cmd.c?id=v3.19-rc6#n3297 >>> >>>> However, you will in any case need to be able to extended the size of the response in the future. >>>> >>> >>> That's already the case for on demand paging. >>> >>>>> >>>>> Link: http://mid.gmane.org/cover.1421931555.git.ydroneaud@xxxxxxxxxx >>>>> Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx> >>>>> Cc: Shachar Raindel <raindel@xxxxxxxxxxxx> >>>>> Cc: Eli Cohen <eli@xxxxxxxxxxxx> >>>>> Cc: Haggai Eran <haggaie@xxxxxxxxxxxx> >>>>> Signed-off-by: Yann Droneaud <ydroneaud@xxxxxxxxxx> >>>>> --- >>>>> drivers/infiniband/core/uverbs_cmd.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c >>>>> index 8668b328b7e6..80a1c90f1dbf 100644 >>>>> --- a/drivers/infiniband/core/uverbs_cmd.c >>>>> +++ b/drivers/infiniband/core/uverbs_cmd.c >>>>> @@ -3313,6 +3313,12 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, >>>>> if (err) >>>>> return err; >>>>> >>>>> + if (cmd.comp_mask & (cmd.comp_mask + 1)) >>>>> + return -EINVAL; >>>>> + >>>>> + if (cmd.comp_mask & ~(__u32)IB_USER_VERBS_EX_QUERY_DEVICE_ODP) >>>>> + return -EINVAL; >>>>> + >>> >>> If we keep the checks on output buffer size from patch 1, returning >>> -ENOSPC in case of size mismatch, and if we are sure that no bit in >>> input comp_mask will ever trigger a call to a kernel function that can >>> make the uverb fail, the latter check on known value could be dropped, >>> allowing the userspace to set the highest mask (-1): userspace >>> will use -ENOSPC to probe the expected size of the response buffer >>> to match the highest supported comp_mask. But it's going to hurt >>> userspace application not ready to receive -ENOSPC on newer kernel >>> with extended QUERY_DEVICE ABI ... Oops. >>> >>> So in this end, the safest way to ensure userspace is doing the correct >>> thing so that we have backward and forward compatibility is to check >>> for known values in comp_mask, check for response buffer size and ensure >>> that data structure chunk are stacked. >>> >>> The tighter are the checks now, the easier the interface could be >>> extended latter. >> >> I understand this position, and I generally agree, but I think that >> specifically for a verb like QUERY_DEVICE that only reads information from >> the kernel driver to user-space, there is no harm in the kernel just >> providing all the information it can fit in the response buffer provided >> by user-space. >> > > Returning as much as possible information to userspace is OK, > but it's doing it the wrong way. > > I'm not specifically discussing QUERY_DEVICE, as I'm concerned with > every extended uverbs to be added to the kernel. > >> Let me explain: newer fields are added to the kernel response struct at the >> end, together with a new comp_mask bit. >> >> Older user-space with newer kernels will simply ask only for the buffer >> size they care about. The fact that the struct is truncated doesn't affect >> these programs because the truncated struct is a valid struct that was >> presented by the kernel in an older version. >> > > You cannot ensure the userspace being correct when specifying a request. > It's a wrong assumption (perhaps not so wrong in the case of > InfiniBand/RDMA, as most userspace program are using libibverbs, > librdmacm and provider's libraries). > > That's why we must not be liberal with the request and check any bit of > it for being something valid, so that erroneous values are catch now, > ensuring userspace is not trying to request things we don't know yet > and it's not aware of it too. Does the solution I proposed above satisfy this requirement? - The kernel validates input size == command struct size and cmd.comp_mask == 0. - The kernel fills as much information as it can fit in the buffer provided by userspace. - The kernel marks which fields it was able to fill in the response's comp_mask. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html