On Tue, Jul 26, 2022 at 08:03:20AM -0600, Alex Williamson wrote: > I raised the same concern myself, the reason for having a limit is > clear, but focusing on a single use case and creating an arbitrary > "good enough" limit that isn't exposed to userspace makes this an > implementation detail that can subtly break userspace. For instance, > what if userspace comes to expect the limit is 1000 and we decide to be > even more strict? If only a few 10s of entries are used, why isn't 100 > more than sufficient? So lets use the number of elements that will fit in PAGE_SIZE as the guideline. It means the kernel can memdup the userspace array into a single kernel page of memory to process it, which seems reasonably future proof in that we won't need to make it lower. Thus we can promise we won't make it smaller. However, remember, this isn't even the real device limit - this is just the limit that the core kernel code will accept to marshal the data to pass internally the driver. I fully expect that the driver will still refuse ranges in certain configurations even if they can be marshaled. This is primarily why I don't think it make sense to expose some internal limit that is not even the real "will the call succeed" parameters. The API is specifically designed as 'try and fail' to allow the drivers flexibility it how they map the requested ranges to their internal operations. > We change it, we break userspace. OTOH, if we simply make use of > that reserved field to expose the limit, now we have a contract with > userspace and we can change our implementation because that detail > of the implementation is visible to userspace. Thanks, I think this is not correct, just because we made it discoverable does not absolve the kernel of compatibility. If we change the limit, eg to 1, and a real userspace stops working then we still broke userspace. Complaining that userspace does not check the discoverable limit doesn't help matters - I seem to remember Linus has written about this in recent times even. So, it is ultimately not different from 'try and fail', unless we implement some algorithm in qemu - an algorithm that would duplicate the one we already have in the kernel :\ Jason