Re: [PATCH 3/4] IB/uverbs: ex_query_device: check request's comp_mask

Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> · Thu, 29 Jan 2015 14:34:36 -0700

On Thu, Jan 29, 2015 at 10:14:29PM +0100, Yann Droneaud wrote:
> > > Unfortunately, the userspace don't get the size of the returned data:
> > > it's only a single "write()" syscall after all.
> > 
> > A write syscall that behaves nothing like write() actually should, so
> > I don't see why we can't have
> > 
> > resp_len = write(fd,inout_buf,sizeof(input_len));
> > 
> > Returning 0 from write makes no sense at all.

Okay, yes, I see, the flow is different for the ex versions from the
non-ex versions.

> > In the fullness of your patchset it will maintain the invariant that
> > resp_len <= sizeof(input_len)

> I don't catch your point: the response can be bigger than the request.

Indeed, I forgot the scheme used that indirection buffer with
pointers. My comments on write() result are all wrong. Sorry

> > It was never the intent that the size should be computed from
> > comp_mask. If the size is necessary it must be explicit.
> > 
> > In this instance if the size is not returned then libibverbs will have
> > to zero the entire user buffer before passing it to the kernel,
> > because it has to ensure any tail for the user app is 0'd.
>
> The proposed patch ensure the integrity of the response regarding
> comp_mask: if a bit is set in response's comp_mask that means the
> related fields are presents (and valid).

The patch it is good, but sitting this into the larger framework where
we have libibverbs consumers compiled against arbitrary versions of
the structure creates a broader complexity that someone has to deal
with.

libibverbs needs to ensure the trailer portion of the structure from
the end user is 0'd.

> > Remember for santity we want comp mask bits for things that can't be 0
> 
> For me, it's better if a bit is set in response's comp_mask by the 
> kernel when the kernel have written something in the related fields
> even if the those fields are all 0.

Yes, but my point is that comp mask bit should only be used to protect
fields where 0 is not an acceptable 'do not support' answer.

Primarily we should rely on memset(0) to provide compatibility, and
only when really necessary introduce a comp mask bit.

This is why the size is important, we can't deduce the size from
comp_mask, and without the size we can't know where the kernel stopped
writing and the whole thing becomes a little more fragile.

Zeroing the response buffer before calling into the kernel is not a
great general pattern.

Perhaps having the kernel zero the trailing portion it doesn't write
too is okay, but it still feels like not telling user space the size
is asking for future trouble.

> > Ideally we want to minimize the number of COMP bits because it is a
> > huge burden on the end user to work with them.
> 
> Sure. So I think comp_mask is just an overly complicated way of
> expressing the version and the size of the response.

Yes. But understand it is selected to try and provide a good end user
experience, exposing structure versions or size directly to the user
is very difficult to work with.

Ideally the end user will not see many comp mask bits (ie I would drop
the ODP one) and things will 'just work' as compiled.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html