On Tue, Sep 22, 2015 at 4:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Tue, Sep 22, 2015 at 02:25:19PM -0700, Dan Williams wrote: >> On Tue, Sep 22, 2015 at 2:13 PM, Andrew Morton >> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: >> > On Tue, 22 Sep 2015 13:36:22 -0600 Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> wrote: >> > >> >> The following commit: >> >> >> >> commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for >> >> DAX") >> >> >> >> moved some code in __dax_pmd_fault() that was responsible for zeroing >> >> newly allocated PMD pages. The new location didn't properly set up >> >> 'kaddr', though, so when run this code resulted in a NULL pointer BUG. >> >> >> >> Fix this by getting the correct 'kaddr' via bdev_direct_access(). >> > >> > Why the heck didn't gcc warn? >> > >> > I had a fiddle: >> > >> > --- a/fs/dax.c~a >> > +++ a/fs/dax.c >> > @@ -529,15 +529,18 @@ int __dax_pmd_fault(struct vm_area_struc >> > unsigned long pmd_addr = address & PMD_MASK; >> > bool write = flags & FAULT_FLAG_WRITE; >> > long length; >> > - void __pmem *kaddr; >> > + void *kaddr; >> > pgoff_t size, pgoff; >> > sector_t block, sector; >> > unsigned long pfn; >> > int result = 0; >> > >> > +// printk("%p\n", kaddr); >> > + >> > /* Fall back to PTEs if we're going to COW */ >> > if (write && !(vma->vm_flags & VM_SHARED)) >> > return VM_FAULT_FALLBACK; >> > + printk("%p\n", kaddr); >> > /* If the PMD would extend outside the VMA */ >> > if (pmd_addr < vma->vm_start) >> > return VM_FAULT_FALLBACK; >> > >> > gcc warns about the first printk, but not about the second. So that >> > "if (...) return ..." seems to have defeated gcc uninitialized-var >> > detection. wtf? >> > >> >> --- a/fs/dax.c >> >> +++ b/fs/dax.c >> >> @@ -569,8 +569,20 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, >> >> if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) >> >> goto fallback; >> >> >> >> + sector = bh.b_blocknr << (blkbits - 9); >> >> + >> >> if (buffer_unwritten(&bh) || buffer_new(&bh)) { >> >> int i; >> >> + >> >> + length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn, >> >> + bh.b_size); >> >> + if (length < 0) { >> >> + result = VM_FAULT_SIGBUS; >> >> + goto out; >> >> + } >> >> + if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR)) >> >> + goto fallback; >> >> + >> >> for (i = 0; i < PTRS_PER_PMD; i++) >> >> clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE); >> >> wmb_pmem(); >> > >> > hm, that's a lot of copy-n-paste. Do we really need to run >> > bdev_direct_access() twice? Will `kaddr' and `pfn' change? >> > >> >> They shouldn't change, but I'm working on a fix for handling the race >> of unbinding the pmem device while that kaddr is in use (unbind >> invalidates kaddr). > > Exactly what does "unbinding the pmem device" mean, echo namespace0.0 > /sys/bus/nd/drivers/nd_pmem/unbind > and why can > (parts of) the pmem device "go away" when there are active > references to it? Normally we have outstanding i/o requests to hold off blk_cleanup_queue(), but in the dax case we don't have any mechanism (yet) to flag the queue as busy. I have some patches to add a percpu_refcount for this purpose. > >> The proposal is a dax_map_bh()/dax_unmap_bh() >> interface to temporarily pin the mapping around each usage. > > Which mapping? The bufferhead maps file offset to filesystem block > addresses, so I'm not sure what problem you are actually refering > to here... The kaddr is coming from the devm_memremap() in the pmem driver that gets unmapped after the device is released by the driver. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html