Re: [PATCH v2] dax: fix NULL pointer in __dax_pmd_fault()

Dan Williams <dan.j.williams@xxxxxxxxx> · Tue, 22 Sep 2015 20:00:29 -0700

On Tue, Sep 22, 2015 at 4:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Sep 22, 2015 at 02:25:19PM -0700, Dan Williams wrote:
>> On Tue, Sep 22, 2015 at 2:13 PM, Andrew Morton
>> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > On Tue, 22 Sep 2015 13:36:22 -0600 Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> wrote:
>> >
>> >> The following commit:
>> >>
>> >> commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for
>> >>       DAX")
>> >>
>> >> moved some code in __dax_pmd_fault() that was responsible for zeroing
>> >> newly allocated PMD pages.  The new location didn't properly set up
>> >> 'kaddr', though, so when run this code resulted in a NULL pointer BUG.
>> >>
>> >> Fix this by getting the correct 'kaddr' via bdev_direct_access().
>> >
>> > Why the heck didn't gcc warn?
>> >
>> > I had a fiddle:
>> >
>> > --- a/fs/dax.c~a
>> > +++ a/fs/dax.c
>> > @@ -529,15 +529,18 @@ int __dax_pmd_fault(struct vm_area_struc
>> >         unsigned long pmd_addr = address & PMD_MASK;
>> >         bool write = flags & FAULT_FLAG_WRITE;
>> >         long length;
>> > -       void __pmem *kaddr;
>> > +       void *kaddr;
>> >         pgoff_t size, pgoff;
>> >         sector_t block, sector;
>> >         unsigned long pfn;
>> >         int result = 0;
>> >
>> > +//     printk("%p\n", kaddr);
>> > +
>> >         /* Fall back to PTEs if we're going to COW */
>> >         if (write && !(vma->vm_flags & VM_SHARED))
>> >                 return VM_FAULT_FALLBACK;
>> > +       printk("%p\n", kaddr);
>> >         /* If the PMD would extend outside the VMA */
>> >         if (pmd_addr < vma->vm_start)
>> >                 return VM_FAULT_FALLBACK;
>> >
>> > gcc warns about the first printk, but not about the second.  So that
>> > "if (...) return ..." seems to have defeated gcc uninitialized-var
>> > detection.  wtf?
>> >
>> >> --- a/fs/dax.c
>> >> +++ b/fs/dax.c
>> >> @@ -569,8 +569,20 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>> >>       if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
>> >>               goto fallback;
>> >>
>> >> +     sector = bh.b_blocknr << (blkbits - 9);
>> >> +
>> >>       if (buffer_unwritten(&bh) || buffer_new(&bh)) {
>> >>               int i;
>> >> +
>> >> +             length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn,
>> >> +                                             bh.b_size);
>> >> +             if (length < 0) {
>> >> +                     result = VM_FAULT_SIGBUS;
>> >> +                     goto out;
>> >> +             }
>> >> +             if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
>> >> +                     goto fallback;
>> >> +
>> >>               for (i = 0; i < PTRS_PER_PMD; i++)
>> >>                       clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
>> >>               wmb_pmem();
>> >
>> > hm, that's a lot of copy-n-paste.  Do we really need to run
>> > bdev_direct_access() twice?  Will `kaddr' and `pfn' change?
>> >
>>
>> They shouldn't change, but I'm working on a fix for handling the race
>> of unbinding the pmem device while that kaddr is in use (unbind
>> invalidates kaddr).
>
> Exactly what does "unbinding the pmem device" mean,

echo namespace0.0 > /sys/bus/nd/drivers/nd_pmem/unbind

> and why can
> (parts of) the pmem device "go away" when there are active
> references to it?

Normally we have outstanding i/o requests to hold off
blk_cleanup_queue(), but in the dax case we don't have any mechanism
(yet) to flag the queue as busy.  I have some patches to add a
percpu_refcount for this purpose.

>
>> The proposal is a dax_map_bh()/dax_unmap_bh()
>> interface to temporarily pin the mapping around each usage.
>
> Which mapping? The bufferhead maps file offset to filesystem block
> addresses, so I'm not sure what problem you are actually refering
> to here...

The kaddr is coming from the devm_memremap() in the pmem driver that
gets unmapped after the device is released by the driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html