Hi Jason, On Thu, 10 Oct 2024 at 13:04, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Thu, Oct 10, 2024 at 11:23:55AM +0100, Fuad Tabba wrote: > > Hi Kirill, > > > > On Thu, 10 Oct 2024 at 11:14, Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote: > > > > > > On Thu, Oct 10, 2024 at 09:59:23AM +0100, Fuad Tabba wrote: > > > > +out: > > > > + if (ret != VM_FAULT_LOCKED) { > > > > + folio_put(folio); > > > > + folio_unlock(folio); > > > > > > Hm. Here and in few other places you return reference before unlocking. > > > > > > I think it is safe because nobody can (or can they?) remove the page from > > > pagecache while the page is locked so we have at least one refcount on the > > > folie, but it *looks* like a use-after-free bug. > > > > > > Please follow the usual pattern: _unlock() then _put(). > > > > That is deliberate, since these patches rely on the refcount to check > > whether the host has any mappings, and the folio lock in order not to > > race. It's not that it's not safe to decrement the refcount after > > unlocking, but by doing that i cannot rely on the folio lock to ensure > > that there aren't any races between the code added to check whether a > > folio is mappable, and the code that checks whether the refcount is > > safe. It's a tiny window, but it's there. > > That seems very suspicious as the folio lock does not protect the > refcount, and we have things like speculative refcount increments in > GUP. > > When we talked at LPC the notion was you could just check if the > refcount was 1 without sleeping or waiting, and somehow deal with !1 > cases. Which also means you shouldn't need a lock around the refcount. The idea of the lock isn't to protect the refcount, which I know isn't protected by the lock. It is to protect against races with the path that (added in this patch series), would check whether the host is allowed to map a certain page/folio. But as Kirill pointed out, there seems to be other issues there, which I'll cover more in my reply to him. Thank you, /fuad > Jason