> -----Original Message----- > From: Jane Chu <jane.chu@xxxxxxxxxx> > Subject: Re: [PATCH RESEND v6 1/9] pagemap: Introduce ->memory_failure() > > Hi, ShiYang, > > So I applied the v6 patch series to my 5.14-rc3 as it's what you indicated is what > v6 was based at, and injected a hardware poison. > > I'm seeing the same problem that was reported a while ago after the poison > was consumed - in the SIGBUS payload, the si_addr is missing: > > ** SIGBUS(7): canjmp=1, whichstep=0, ** > ** si_addr(0x(nil)), si_lsb(0xC), si_code(0x4, BUS_MCEERR_AR) ** > > The si_addr ought to be 0x7f6568000000 - the vaddr of the first page in this > case. > > Something is not right... Hi Jane, Sorry for late reply. Thanks for testing. This address should have been reported in my code. I'll check why it's finally nil. -- Thanks. Ruan. > > thanks, > -jane > > > On 8/5/2021 6:17 PM, Jane Chu wrote: > > The filesystem part of the pmem failure handling is at minimum built > > on PAGE_SIZE granularity - an inheritance from general memory_failure > > handling. However, with Intel's DCPMEM technology, the error blast > > radius is no more than 256bytes, and might get smaller with future > > hardware generation, also advanced atomic 64B write to clear the poison. > > But I don't see any of that could be incorporated in, given that the > > filesystem is notified a corruption with pfn, rather than an exact > > address. > > > > So I guess this question is also for Dan: how to avoid unnecessarily > > repairing a PMD range for a 256B corrupt range going forward? > > > > thanks, > > -jane > > > > > > On 7/30/2021 3:01 AM, Shiyang Ruan wrote: > >> When memory-failure occurs, we call this function which is > >> implemented by each kind of devices. For the fsdax case, pmem device > >> driver implements it. Pmem device driver will find out the > >> filesystem in which the corrupted page located in. And finally call > >> filesystem handler to deal with this error. > >> > >> The filesystem will try to recover the corrupted data if necessary. > >