Re: [PATCH v3 11/12] mm, memory_failure: Teach memory_failure() about dev_pagemap pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 7, 2018 at 9:30 AM, Jan Kara <jack@xxxxxxx> wrote:
> On Mon 04-06-18 16:12:37, Dan Williams wrote:
>>     mce: Uncorrected hardware memory error in user-access at af34214200
>>     {1}[Hardware Error]: It has been corrected by h/w and requires no further action
>>     mce: [Hardware Error]: Machine check events logged
>>     {1}[Hardware Error]: event severity: corrected
>>     Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
>>     [..]
>>     Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
>>     mce: Memory error not recovered
>>
>> In contrast to typical memory, dev_pagemap pages may be dax mapped. With
>> dax there is no possibility to map in another page dynamically since dax
>> establishes 1:1 physical address to file offset associations. Also
>> dev_pagemap pages associated with NVDIMM / persistent memory devices can
>> internal remap/repair addresses with poison. While memory_failure()
>> assumes that it can discard typical poisoned pages and keep them
>> unmapped indefinitely, dev_pagemap pages may be returned to service
>> after the error is cleared.
>>
>> Teach memory_failure() to detect and handle MEMORY_DEVICE_HOST
>> dev_pagemap pages that have poison consumed by userspace. Mark the
>> memory as UC instead of unmapping it completely to allow ongoing access
>> via the device driver (nd_pmem). Later, nd_pmem will grow support for
>> marking the page back to WB when the error is cleared.
>
> ...
>
>> +static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>> +             struct dev_pagemap *pgmap)
>> +{
>> +     struct page *page = pfn_to_page(pfn);
>> +     const bool unmap_success = true;
>> +     struct address_space *mapping;
>> +     unsigned long size;
>> +     LIST_HEAD(tokill);
>> +     int rc = -EBUSY;
>> +     loff_t start;
>> +
>> +     /*
>> +      * Prevent the inode from being freed while we are interrogating
>> +      * the address_space, typically this would be handled by
>> +      * lock_page(), but dax pages do not use the page lock.
>> +      */
>> +     rcu_read_lock();
>> +     mapping = page->mapping;
>> +     if (!mapping) {
>> +             rcu_read_unlock();
>> +             goto out;
>> +     }
>> +     if (!igrab(mapping->host)) {
>> +             mapping = NULL;
>> +             rcu_read_unlock();
>> +             goto out;
>> +     }
>> +     rcu_read_unlock();
>
> Why don't you use radix tree entry lock here instead? That is a direct
> replacement of the page lock and you don't have to play games with pinning
> the inode and verifying the mapping afterwards.

I was hoping to avoid teaching device-dax about radix entries...

However, now that I look again, we're already protected against
device-dax inode teardown by the dev_pagemap reference.

I'll switch over.

>> +out:
>> +     if (mapping)
>> +             iput(mapping->host);
>
> BTW, this would have to be prepared to do full inode deletion which I'm not
> quite sure is safe from this context.

Ah, I had not considered that, now I know for future reference.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux