On Sun, Jun 30, 2019 at 02:37:32PM -0700, Dan Williams wrote: > On Sun, Jun 30, 2019 at 8:23 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > I think my theory was slightly mistaken, but your fix has the effect of > > fixing the actual problem too. > > > > The xas->xa_index for a PMD is going to be PMD-aligned (ie a multiple of > > 512), but xas_find_conflict() does _not_ adjust xa_index (... which I > > really should have mentioned in the documentation). So we go to sleep > > on the PMD-aligned index instead of the index of the PTE. Your patch > > fixes this by using the PMD-aligned index for PTEs too. > > > > I'm trying to come up with a clean fix for this. Clearly we > > shouldn't wait for a PTE entry if we're looking for a PMD entry. > > But what should get_unlocked_entry() return if it detects that case? > > We could have it return an error code encoded as an internal entry, > > like grab_mapping_entry() does. Or we could have it return the _locked_ > > PTE entry, and have callers interpret that. > > > > At least get_unlocked_entry() is static, but it's got quite a few callers. > > Trying to discern which ones might ask for a PMD entry is a bit tricky. > > So this seems like a large patch which might have bugs. > > > > Thoughts? > > ...but if it was a problem of just mismatched waitqueue's I would have > expected it to trigger prior to commit b15cd800682f "dax: Convert page > fault handlers to XArray". That commit converts grab_mapping_entry() (called by dax_iomap_pmd_fault()) from calling get_unlocked_mapping_entry() to calling get_unlocked_entry(). get_unlocked_mapping_entry() (eventually) called __radix_tree_lookup() instead of dax_find_conflict(). > This hunk, if I'm reading it correctly, > looks suspicious: @index in this case is coming directly from > vm->pgoff without pmd alignment adjustment whereas after the > conversion it's always pmd aligned from the xas->xa_index. So perhaps > the issue is that the lock happens at pte granularity. I expect it > would cause the old put_locked_mapping_entry() to WARN, but maybe that > avoids the lockup and was missed in the bisect. I don't think that hunk is the problem. The __radix_tree_lookup() is going to return a 'slot' which points to the canonical slot, no matter which of the 512 indices corresponding to that slot is chosen. So I think it's going to do essentially the same thing. > @@ -884,21 +711,18 @@ static void *dax_insert_entry(struct > address_space *mapping, > * existing entry is a PMD, we will just leave the PMD in the > * tree and dirty it if necessary. > */ > - struct radix_tree_node *node; > - void **slot; > - void *ret; > - > - ret = __radix_tree_lookup(pages, index, &node, &slot); > - WARN_ON_ONCE(ret != entry); > - __radix_tree_replace(pages, node, slot, > - new_entry, NULL); > + void *old = dax_lock_entry(xas, new_entry); > + WARN_ON_ONCE(old != xa_mk_value(xa_to_value(entry) | > + DAX_LOCKED)); > entry = new_entry; > + } else { > + xas_load(xas); /* Walk the xa_state */ > } > > if (dirty) > - radix_tree_tag_set(pages, index, PAGECACHE_TAG_DIRTY); > + xas_set_mark(xas, PAGECACHE_TAG_DIRTY); > > - xa_unlock_irq(pages); > + xas_unlock_irq(xas); > return entry; > }