On Wed, 7 Sep 2022 16:55:50 -0300 Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Wed, Sep 07, 2022 at 12:56:27PM -0600, Alex Williamson wrote: > > > I thought we'd already agreed that we were stuck with locked_vm for > > type1 and any compatibility mode of type1 due to this. Native iommufd > > support can do the right thing since userspace will need to account for > > various new usage models anyway. > > We did, that was for the iommufd situation (which will also hit the > same zeropage issue, sigh) - this discussion is about fixing a bug in > vfio and what many consider a bug in GUP. > > My point is I'm still not convinced we can really consider these > limits as ABI because it opens a pandoras box of kernel limitations. > > > I've raised the issue with David for the zero page accounting, but I > > don't know what the solution is. libvirt automatically adds a 1GB > > fudge factor to the VM locked memory limits to account for things like > > ROM mappings, or at least the non-zeropage backed portion of those > > ROMs. I think that most management tools have adopted similar, so the > > majority of users shouldn't notice. However this won't cover all > > users, so we certainly risk breaking userspace if we introduce hard > > page accounting of zero pages. > > It sounds like things will be fine. 1GB fudge is pretty big. > > For things like this ABI compat is not about absolute compatability in > the face of any userspace, but a real-world compatibility "does > something that actually exists break?" Magic 8 ball says "Cannot predict now." Unfortunately there's a lot of roll-your-own scripting that goes on in this space, many users reject the overhead of things like libvirt, let alone deeper management stacks. Private clouds have constraints that might also generate custom solutions. I can't predict the degree to which libvirt is a canonical example. > So I would be happier if we had an actual deployed thing that breaks.. > I would be inclined to go with the simple fix and rely on the > fudge. If someone does come with an actual break then lets do one of > the work arounds. We should probably have a workaround in our pocket for such a case. Also, I want to clarify, is this a recommendation relative to the stable patch proposed here, or only once we get rid of shared zero page pinning? We can't simply do accounting on the shared zero page since a single user can overflow the refcount. > Given the whole thing is obstensibly for security it is better to keep > it simple and sane then to poke it full of holes. > > > module parameter defined limit. We might also consider whether we > > could just ignore zero page mappings, maybe with a optional "strict" > > mode module option to generate an errno on such mappings. Thanks, > > Once GUP is fixed vfio won't see the zero pages anymore :( That really > limits the choices for a work around :( I was afraid of that. Thanks, Alex