Re: [PATCH V2 vfio 03/11] vfio: Introduce DMA logging uAPIs

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 26 Jul 2022 12:04:52 -0300

On Tue, Jul 26, 2022 at 08:03:20AM -0600, Alex Williamson wrote:

> I raised the same concern myself, the reason for having a limit is
> clear, but focusing on a single use case and creating an arbitrary
> "good enough" limit that isn't exposed to userspace makes this an
> implementation detail that can subtly break userspace.  For instance,
> what if userspace comes to expect the limit is 1000 and we decide to be
> even more strict?  If only a few 10s of entries are used, why isn't 100
> more than sufficient?  

So lets use the number of elements that will fit in PAGE_SIZE as the
guideline. It means the kernel can memdup the userspace array into a
single kernel page of memory to process it, which seems reasonably
future proof in that we won't need to make it lower. Thus we can
promise we won't make it smaller.

However, remember, this isn't even the real device limit - this is
just the limit that the core kernel code will accept to marshal the
data to pass internally the driver.

I fully expect that the driver will still refuse ranges in certain
configurations even if they can be marshaled.

This is primarily why I don't think it make sense to expose some
internal limit that is not even the real "will the call succeed"
parameters.

The API is specifically designed as 'try and fail' to allow the
drivers flexibility it how they map the requested ranges to their
internal operations.

> We change it, we break userspace.  OTOH, if we simply make use of
> that reserved field to expose the limit, now we have a contract with
> userspace and we can change our implementation because that detail
> of the implementation is visible to userspace.  Thanks,

I think this is not correct, just because we made it discoverable does
not absolve the kernel of compatibility. If we change the limit, eg to
1, and a real userspace stops working then we still broke userspace.

Complaining that userspace does not check the discoverable limit
doesn't help matters - I seem to remember Linus has written about this
in recent times even.

So, it is ultimately not different from 'try and fail', unless we
implement some algorithm in qemu - an algorithm that would duplicate
the one we already have in the kernel :\

Jason