On 26/01/2015 13:17, Yann Droneaud wrote: > ... > Le dimanche 25 janvier 2015 à 17:23 +0200, Haggai Eran a écrit : >> On 22/01/2015 15:28, Yann Droneaud wrote: >>> This patch ensures the extended QUERY_DEVICE uverbs request's >>> comp_mask has only known values. If userspace returns unknown >>> features, -EINVAL will be returned, allowing to probe/discover >>> which features are currently supported by the kernel. >> >> This probing process will be much more cumbersome than it needs to be >> because userspace will have to call QUERY_DEVICE repetitively with >> different comp_mask fields until it finds out the exact set of supported >> bits. >> > > O(log2(N)) I don't think user space developers will be happy having to do trial and error to determine what features the kernel driver supports. It might be even more then O(log2(N)). If my understanding of comp_mask bits usage is correct it would O(N). But it's not the time complexity I'm worried about, it's the fact that it requires user-space developers to go through hoops in order to get information that can be much more easily exported. > Or you had to add a different interface, dedicated to retrieve the exact > supported feature mask. > >>> Moreover, it also ensure the requested features set in comp_mask >>> are sequentially set, not skipping intermediate features, eg. the >>> "highest" requested feature also request all the "lower" ones. >>> This way each new feature will have to be stacked on top of the >>> existing ones: this is especially important for the request and >>> response data structures where fields are added after the >>> current ones when expanded to support a new feature. >> >> I think it is perfectly acceptable that not all drivers will implement >> all available features, and so you shouldn't enforce this requirement. > > With regard to QUERY_DEVICE: the data structure layout depends on the > comp_mask value. So either you propose a way to express multipart data > structure (see CMSG or NETLINK), or we have to ensure the data structure > is ever-growing, with each new chunck stacked over the existing ones: > that's the purpose of : > > if (cmd.comp_mask & (cmd.comp_mask + 1)) > return -EINVAL; > >> Also, it makes the comp_mask nothing more than a wasteful version number >> between 0 and 63. > > That's what I've already explained earlier in "Re: [PATCH v3 06/17] > IB/core: Add support for extended query device caps": > > http://mid.gmane.org/1421844612.13543.40.camel@xxxxxxxxxx Yes, you wrote there: > Regarding comp_mask (not for this particular verb): > > It's not clear whether request's comp_mask describe the request or the > response, as such I'm puzzled. > > How would the kernel and the userspace be able to parse the request and > the response if they ignore unknown bits ? > > How would they be able to skip the unrecognized chunk of the request and > response buffer ? > > How one bit in a comp_mask is related to a chunk in the request or > response ? > > It's likely the kernel or userspace would have to skip the remaining > comp_mask's bits after encountering an unknown bit as the size of the > corresponding chunk in request or response would be unknown, making > impossible to locate the corresponding chunk for the next bit set in > comp_mask. Having said that, comp_mask is just a complicated way of > expressing a version, which is turn describe a size (ever growing). It is my understanding that each comp_mask bit marks a set of fields in the command or in the response struct as valid, so the struct format remains the same and the kernel and userspace don't need to make difficult calculation as to where each field is, but you can still pass a high bit set in comp_mask with one of the lower bits cleared. I couldn't find this explicit detail in the mailing list, but I did found a presentation that was presented in OFA International Developer Workshop 2013 [1], that gave an example of of an verb where each comp_mask bit marked a different field as valid. > >> >> In the specific case of QUERY_DEVICE you might argue that there isn't >> any need for input comp_mask, only for output, and then you may enforce >> the input comp_mask will always be zero. > > The extended QUERY_DEVICE uverbs as currently merged is using comp_mask > from input to choose to report on-demand-paging related value. So it > seems it's needed. > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/uverbs_cmd.c?id=v3.19-rc6#n3297 > >> However, you will in any case need to be able to extended the size of the response in the future. >> > > That's already the case for on demand paging. > >>> >>> Link: http://mid.gmane.org/cover.1421931555.git.ydroneaud@xxxxxxxxxx >>> Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx> >>> Cc: Shachar Raindel <raindel@xxxxxxxxxxxx> >>> Cc: Eli Cohen <eli@xxxxxxxxxxxx> >>> Cc: Haggai Eran <haggaie@xxxxxxxxxxxx> >>> Signed-off-by: Yann Droneaud <ydroneaud@xxxxxxxxxx> >>> --- >>> drivers/infiniband/core/uverbs_cmd.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c >>> index 8668b328b7e6..80a1c90f1dbf 100644 >>> --- a/drivers/infiniband/core/uverbs_cmd.c >>> +++ b/drivers/infiniband/core/uverbs_cmd.c >>> @@ -3313,6 +3313,12 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, >>> if (err) >>> return err; >>> >>> + if (cmd.comp_mask & (cmd.comp_mask + 1)) >>> + return -EINVAL; >>> + >>> + if (cmd.comp_mask & ~(__u32)IB_USER_VERBS_EX_QUERY_DEVICE_ODP) >>> + return -EINVAL; >>> + > > If we keep the checks on output buffer size from patch 1, returning > -ENOSPC in case of size mismatch, and if we are sure that no bit in > input comp_mask will ever trigger a call to a kernel function that can > make the uverb fail, the latter check on known value could be dropped, > allowing the userspace to set the highest mask (-1): userspace > will use -ENOSPC to probe the expected size of the response buffer > to match the highest supported comp_mask. But it's going to hurt > userspace application not ready to receive -ENOSPC on newer kernel > with extended QUERY_DEVICE ABI ... Oops. > > So in this end, the safest way to ensure userspace is doing the correct > thing so that we have backward and forward compatibility is to check > for known values in comp_mask, check for response buffer size and ensure > that data structure chunk are stacked. > > The tighter are the checks now, the easier the interface could be > extended latter. I understand this position, and I generally agree, but I think that specifically for a verb like QUERY_DEVICE that only reads information from the kernel driver to user-space, there is no harm in the kernel just providing all the information it can fit in the response buffer provided by user-space. Let me explain: newer fields are added to the kernel response struct at the end, together with a new comp_mask bit. Older user-space with newer kernels will simply ask only for the buffer size they care about. The fact that the struct is truncated doesn't affect these programs because the truncated struct is a valid struct that was presented by the kernel in an older version. They may or may not receive a comp_mask bit they don't recognize, but that only tells them that the kernel supports new features they don't know about. Newer user-space with older kernel will give a larger buffer then the kernel can fill. The kernel only fills in the beginning of the user-space buffer, and provides user-space with the comp_mask bits that mark which fields are valid. So user-space can tell that the end of the buffer isn't valid. In my implementation I also left the ending uninitialized, but the kernel can zero it if you think it is important. So I hope you agree this scheme is extendible and allows keeping backward and forward compatibility. If you can think of another scheme that will be more strict with the buffer sizes, but doesn't require user-space to do extra work, I'll be happy to hear about it. Regards, Haggai [1] Extending Verbs API, Tzahi Oved https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html