> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, July 26, 2022 11:05 PM > > On Tue, Jul 26, 2022 at 08:03:20AM -0600, Alex Williamson wrote: > > > I raised the same concern myself, the reason for having a limit is > > clear, but focusing on a single use case and creating an arbitrary > > "good enough" limit that isn't exposed to userspace makes this an > > implementation detail that can subtly break userspace. For instance, > > what if userspace comes to expect the limit is 1000 and we decide to be > > even more strict? If only a few 10s of entries are used, why isn't 100 > > more than sufficient? > > So lets use the number of elements that will fit in PAGE_SIZE as the > guideline. It means the kernel can memdup the userspace array into a > single kernel page of memory to process it, which seems reasonably > future proof in that we won't need to make it lower. Thus we can > promise we won't make it smaller. > > However, remember, this isn't even the real device limit - this is > just the limit that the core kernel code will accept to marshal the > data to pass internally the driver. > > I fully expect that the driver will still refuse ranges in certain > configurations even if they can be marshaled. > > This is primarily why I don't think it make sense to expose some > internal limit that is not even the real "will the call succeed" > parameters. > > The API is specifically designed as 'try and fail' to allow the > drivers flexibility it how they map the requested ranges to their > internal operations. > > > We change it, we break userspace. OTOH, if we simply make use of > > that reserved field to expose the limit, now we have a contract with > > userspace and we can change our implementation because that detail > > of the implementation is visible to userspace. Thanks, > > I think this is not correct, just because we made it discoverable does > not absolve the kernel of compatibility. If we change the limit, eg to > 1, and a real userspace stops working then we still broke userspace. iiuc Alex's suggestion doesn't conflict with the 'try and fail' model. By using the reserved field of vfio_device_feature_dma_logging_control to return the limit of the specified page_size from a given tracker, the user can quickly retry and adapt to that limit if workable. Otherwise what would be an efficient policy for user to retry after a failure? Say initially user requests 100 ranges with 4K page size but the tracker can only support 10 ranges. w/o a hint returned from the tracker then the user just blindly try 100, 90, 80, ... or using a bisect algorithm? > > Complaining that userspace does not check the discoverable limit > doesn't help matters - I seem to remember Linus has written about this > in recent times even. > > So, it is ultimately not different from 'try and fail', unless we > implement some algorithm in qemu - an algorithm that would duplicate > the one we already have in the kernel :\ > > Jason