RE: [PATCH V2 vfio 03/11] vfio: Introduce DMA logging uAPIs

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Thu, 28 Jul 2022 04:05:04 +0000

> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Tuesday, July 26, 2022 11:05 PM
> 
> On Tue, Jul 26, 2022 at 08:03:20AM -0600, Alex Williamson wrote:
> 
> > I raised the same concern myself, the reason for having a limit is
> > clear, but focusing on a single use case and creating an arbitrary
> > "good enough" limit that isn't exposed to userspace makes this an
> > implementation detail that can subtly break userspace.  For instance,
> > what if userspace comes to expect the limit is 1000 and we decide to be
> > even more strict?  If only a few 10s of entries are used, why isn't 100
> > more than sufficient?
> 
> So lets use the number of elements that will fit in PAGE_SIZE as the
> guideline. It means the kernel can memdup the userspace array into a
> single kernel page of memory to process it, which seems reasonably
> future proof in that we won't need to make it lower. Thus we can
> promise we won't make it smaller.
> 
> However, remember, this isn't even the real device limit - this is
> just the limit that the core kernel code will accept to marshal the
> data to pass internally the driver.
> 
> I fully expect that the driver will still refuse ranges in certain
> configurations even if they can be marshaled.
> 
> This is primarily why I don't think it make sense to expose some
> internal limit that is not even the real "will the call succeed"
> parameters.
> 
> The API is specifically designed as 'try and fail' to allow the
> drivers flexibility it how they map the requested ranges to their
> internal operations.
> 
> > We change it, we break userspace.  OTOH, if we simply make use of
> > that reserved field to expose the limit, now we have a contract with
> > userspace and we can change our implementation because that detail
> > of the implementation is visible to userspace.  Thanks,
> 
> I think this is not correct, just because we made it discoverable does
> not absolve the kernel of compatibility. If we change the limit, eg to
> 1, and a real userspace stops working then we still broke userspace.

iiuc Alex's suggestion doesn't conflict with the 'try and fail' model.
By using the reserved field of vfio_device_feature_dma_logging_control
to return the limit of the specified page_size from a given tracker, 
the user can quickly retry and adapt to that limit if workable.

Otherwise what would be an efficient policy for user to retry after
a failure? Say initially user requests 100 ranges with 4K page size
but the tracker can only support 10 ranges. w/o a hint returned
from the tracker then the user just blindly try 100, 90, 80, ... or 
using a bisect algorithm?

> 
> Complaining that userspace does not check the discoverable limit
> doesn't help matters - I seem to remember Linus has written about this
> in recent times even.
> 
> So, it is ultimately not different from 'try and fail', unless we
> implement some algorithm in qemu - an algorithm that would duplicate
> the one we already have in the kernel :\
> 
> Jason