On Mon, Oct 21, 2019 at 07:06:00PM +0000, Jason Gunthorpe wrote: > On Mon, Oct 21, 2019 at 02:40:41PM -0400, Jerome Glisse wrote: > > On Tue, Oct 15, 2019 at 03:12:27PM -0300, Jason Gunthorpe wrote: > > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > > > 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1, > > > scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where > > > they only use invalidate_range_start/end and immediately check the > > > invalidating range against some driver data structure to tell if the > > > driver is interested. Half of them use an interval_tree, the others are > > > simple linear search lists. > > > > > > Of the ones I checked they largely seem to have various kinds of races, > > > bugs and poor implementation. This is a result of the complexity in how > > > the notifier interacts with get_user_pages(). It is extremely difficult to > > > use it correctly. > > > > > > Consolidate all of this code together into the core mmu_notifier and > > > provide a locking scheme similar to hmm_mirror that allows the user to > > > safely use get_user_pages() and reliably know if the page list still > > > matches the mm. > > > > > > This new arrangment plays nicely with the !blockable mode for > > > OOM. Scanning the interval tree is done such that the intersection test > > > will always succeed, and since there is no invalidate_range_end exposed to > > > drivers the scheme safely allows multiple drivers to be subscribed. > > > > > > Four places are converted as an example of how the new API is used. > > > Four are left for future patches: > > > - i915_gem has complex locking around destruction of a registration, > > > needs more study > > > - hfi1 (2nd user) needs access to the rbtree > > > - scif_dma has a complicated logic flow > > > - vhost's mmu notifiers are already being rewritten > > > > > > This is still being tested, but I figured to send it to start getting help > > > from the xen, amd and hfi drivers which I cannot test here. > > > > It might be a good oportunity to also switch those users to > > hmm_range_fault() instead of GUP as GUP is pointless for those > > users. In fact the GUP is an impediment to normal mm operations. > > I think vhost can use hmm_range_fault > > hfi1 does actually need to have the page pin, it doesn't fence DMA > during invalidate. > > i915_gem feels alot like amdgpu, so probably it would benefit > > No idea about scif_dma > > > I will test on nouveau. > > Thanks, hopefully it still works, I think Ralph was able to do some > basic checks. But it is a pretty complicated series, I probably made > some mistakes. So it seems to work ok with nouveau, will let tests run in loop thought there are not very advance test. > > FWIW, I know that nouveau gets a lockdep splat now from Daniel > Vetter's recent changes, it tries to do GFP_KERENEL allocations under > a lock also held by the invalidate_range_start path. I have not seen any splat so far, is it throught some new kernel config ? Cheers, Jérôme