Re: [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2022-03-22 at 11:57 -0300, Jason Gunthorpe wrote:
> On Tue, Mar 22, 2022 at 03:28:22PM +0100, Niklas Schnelle wrote:
> > On Fri, 2022-03-18 at 14:27 -0300, Jason Gunthorpe wrote:
> > > Following the pattern of io_uring, perf, skb, and bpf iommfd will use
> >                                                 iommufd ----^
> > > user->locked_vm for accounting pinned pages. Ensure the value is included
> > > in the struct and export free_uid() as iommufd is modular.
> > > 
> > > user->locked_vm is the correct accounting to use for ulimit because it is
> > > per-user, and the ulimit is not supposed to be per-process. Other
> > > places (vfio, vdpa and infiniband) have used mm->pinned_vm and/or
> > > mm->locked_vm for accounting pinned pages, but this is only per-process
> > > and inconsistent with the majority of the kernel.
> > 
> > Since this will replace parts of vfio this difference seems
> > significant. Can you explain this a bit more?
> 
> I'm not sure what to say more, this is the correct way to account for
> this. It is natural to see it is right because the ulimit is supposted
> to be global to the user, not effectively reset every time the user
> creates a new process.
> 
> So checking the ulimit against a per-process variable in the mm_struct
> doesn't make too much sense.

Yes I agree that logically this makes more sense. I was kind of aiming
in the same direction as Alex i.e. it's a conscious decision to do it
right and we need to know where this may lead to differences and how to
handle them.

> 
> > I'm also a bit confused how io_uring handles this. When I stumbled over
> > the problem fixed by 6b7898eb180d ("io_uring: fix imbalanced sqo_mm
> > accounting") and from that commit description I seem to rember that
> > io_uring also accounts in mm->locked_vm too? 
> 
> locked/pinned_pages in the mm is kind of a debugging counter, it
> indicates how many pins the user obtained through this mm. AFAICT its
> only correct use is to report through proc. Things are supposed to
> update it, but there is no reason to limit it as the user limit
> supersedes it.
> 
> The commit you pointed at is fixing that io_uring corrupted the value.
> 
> Since VFIO checks locked/pinned_pages against the ulimit would blow up
> when the value was wrong.
> 
> > In fact I stumbled over that because the wrong accounting in
> > io_uring exhausted the applied to vfio (I was using a QEMU utilizing
> > io_uring itself).
> 
> I'm pretty interested in this as well, do you have anything you can
> share?

This was the issue reported in the following BZ.

https://bugzilla.kernel.org/show_bug.cgi?id=209025

I stumbled over the same problem on my x86 box and also on s390. I
don't remember exactly what limit this ran into but I suspect it had
something to do with the libvirt resource limits Alex mentioned.
Meaning io_uring had an accounting bug and then vfio / QEMU couldn't
pin memory. I think that libvirt limit is set to allow pinning all of
guest memory plus a bit so the io_uring misaccounting easily tipped it
over.

> 
> Thanks,
> Jason





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux