> -----Original Message----- > From: ruansy.fnst@xxxxxxxxxxx <ruansy.fnst@xxxxxxxxxxx> > Subject: RE: [PATCH v3 01/11] pagemap: Introduce ->memory_failure() > > > > > > > > > > > > After the conversation with Dave I don't see the point of this. > > > > > > If there is a memory_failure() on a page, why not just call > > > > > > memory_failure()? That already knows how to find the inode and > > > > > > the filesystem can be notified from there. > > > > > > > > > > We want memory_failure() supports reflinked files. In this > > > > > case, we are not able to track multiple files from a page(this > > > > > broken > > > > > page) because > > > > > page->mapping,page->index can only track one file. Thus, I > > > > > page->introduce this > > > > > ->memory_failure() implemented in pmem driver, to call > > > > > ->->corrupted_range() > > > > > upper level to upper level, and finally find out files who are > > > > > using(mmapping) this page. > > > > > > > > > > > > > I know the motivation, but this implementation seems backwards. > > > > It's already the case that memory_failure() looks up the > > > > address_space associated with a mapping. From there I would expect > > > > a new 'struct address_space_operations' op to let the fs handle > > > > the case when there are multiple address_spaces associated with a given > file. > > > > > > > > > > Let me think about it. In this way, we > > > 1. associate file mapping with dax page in dax page fault; > > > > I think this needs to be a new type of association that proxies the > > representation of the reflink across all involved address_spaces. > > > > > 2. iterate files reflinked to notify `kill processes signal` by the > > > new address_space_operation; > > > 3. re-associate to another reflinked file mapping when unmmaping > > > (rmap qeury in filesystem to get the another file). > > > > Perhaps the proxy object is reference counted per-ref-link. It seems > > error prone to keep changing the association of the pfn while the reflink is > in-tact. > Hi, Dan > > I think my early rfc patchset was implemented in this way: > - Create a per-page 'dax-rmap tree' to store each reflinked file's (mapping, > offset) when causing dax page fault. > - Mount this tree on page->zone_device_data which is not used in fsdax, so > that we can iterate reflinked file mappings in memory_failure() easily. > In my understanding, the dax-rmap tree is the proxy object you mentioned. If > so, I have to say, this method was rejected. Because this will cause huge > overhead in some case that every dax page have one dax-rmap tree. > Hi, Dan How do you think about this? I am still confused. Could you give me some advice? -- Thanks, Ruan Shiyang. > > -- > Thanks, > Ruan Shiyang. > > > > > It did not handle those dax pages are not in use, because their > > > ->mapping are not associated to any file. I didn't think it through > > > until reading your conversation. Here is my understanding: this > > > case should be handled by badblock mechanism in pmem driver. This > > > badblock mechanism will call > > > ->corrupted_range() to tell filesystem to repaire the data if possible. > > > > There are 2 types of notifications. There are badblocks discovered by > > the driver (see notify_pmem()) and there are memory_failures() > > signalled by the CPU machine-check handler, or the platform BIOS. In > > the case of badblocks that needs to be information considered by the > > fs block allocator to avoid / try-to-repair badblocks on allocate, and > > to allow listing damaged files that need repair. The memory_failure() > > notification needs immediate handling to tear down mappings to that > > pfn and signal processes that have consumed it with > > SIGBUS-action-required. Processes that have the poison mapped, but have not > consumed it receive SIGBUS-action-optional. > > > > > So, we split it into two parts. And dax device and block device > > > won't be > > mixed > > > up again. Is my understanding right? > > > > Right, it's only the filesystem that knows that the block_device and > > the dax_device alias data at the same logical offset. The requirements > > for sector error handling and page error handling are separate like > > block_device_operations and dax_operations. > > > > > But the solution above is to solve the hwpoison on one or couple > > > pages, which happens rarely(I think). Do the 'pmem remove' > > > operation > > cause hwpoison too? > > > Call memory_failure() so many times? I havn't understood this yet. > > > > I'm working on a patch here to call memory_failure() on a wide range > > for the surprise remove of a dax_device while a filesystem might be > > mounted. It won't be efficient, but there is no other way to notify > > the kernel that it needs to immediately stop referencing a page. > _______________________________________________ > Linux-nvdimm mailing list -- linux-nvdimm@xxxxxxxxxxxx To unsubscribe send an > email to linux-nvdimm-leave@xxxxxxxxxxxx >