On Mon 26-11-18 08:12:40, Matthew Wilcox wrote: > > I noticed this path while I was doing the 4.19 backport of > dax: Avoid losing wakeup in dax_lock_mapping_entry > > xa_unlock_irq(&mapping->i_pages); > revalidate = wait_fn(); > finish_wait(wq, &ewait.wait); > xa_lock_irq(&mapping->i_pages); I guess this is a snippet from get_unlocked_entry(), isn't it? > It's not safe to call xa_lock_irq() if mapping can have been freed while > we slept. We'll probably get away with it; most filesystems use a unique > slab for their inodes, so you'll likely get either a freed inode or an > inode which is now the wrong inode. But if that page has been freed back > to the page allocator, that pointer could now be pointing at anything. Correct. Thanks for catching this bug! > Fixing this in the current codebase is no easier than fixing it in the > 4.19 codebase. This is the best I've come up with. Could we do better > by not using the _exclusive form of prepare_to_wait()? I'm not familiar > with all the things that need to be considered when using this family > of interfaces. > > diff --git a/fs/dax.c b/fs/dax.c > index 9bcce89ea18e..154b592b18eb 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -232,6 +232,24 @@ static void *get_unlocked_entry(struct xa_state *xas) > } > } > > +static void wait_unlocked_entry(struct xa_state *xas, void *entry) > +{ > + struct wait_exceptional_entry_queue ewait; > + wait_queue_head_t *wq; > + > + init_wait(&ewait.wait); > + ewait.wait.func = wake_exceptional_entry_func; > + > + wq = dax_entry_waitqueue(xas, entry, &ewait.key); > + prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE); > + xas_unlock_irq(xas); > + /* We can no longer look at xas */ > + schedule(); > + finish_wait(wq, &ewait.wait); > + if (waitqueue_active(wq)) > + __wake_up(wq, TASK_NORMAL, 1, &ewait.key); > +} > + The code looks good. Maybe can we call this wait_entry_unlocked() to stress that entry is not really usable after this function returns? And comment before the function that this is safe to call even if we don't have a reference keeping mapping alive? > static void put_unlocked_entry(struct xa_state *xas, void *entry) > { > /* If we were the only waiter woken, wake the next one */ > @@ -389,9 +407,7 @@ bool dax_lock_mapping_entry(struct page *page) > entry = xas_load(&xas); > if (dax_is_locked(entry)) { > rcu_read_unlock(); > - entry = get_unlocked_entry(&xas); > - xas_unlock_irq(&xas); > - put_unlocked_entry(&xas, entry); > + wait_unlocked_entry(&xas, entry); > rcu_read_lock(); > continue; The continue here actually is not safe either because if the mapping got freed, page->mapping will be NULL and we oops at the beginning of the loop. So that !dax_mapping() check should also check for mapping != NULL. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR