On Mon, Jun 11, 2018 at 08:19:54AM -0700, Dan Williams wrote: > On Mon, Jun 11, 2018 at 7:56 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Mon 11-06-18 07:44:39, Dan Williams wrote: > > [...] > >> I'm still trying to understand the next level of detail on where you > >> think the design should go next? Is it just the HWPoison page flag? > >> Are you concerned about supporting greater than PAGE_SIZE poison? > > > > I simply do not want to check for HWPoison at zillion of places and have > > each type of page to have some special handling which can get wrong very > > easily. I am not clear on details here, this is something for users of > > hwpoison to define what is the reasonable scenarios when the feature is > > useful and turn that into a feature list that can be actually turned > > into a design document. See the different from let's put some more on > > top approach... > > > > So you want me to pay the toll of writing a design document justifying > all the existing use cases of HWPoison before we fix the DAX bugs, and > the design document may or may not result in any substantive change to > these patches? > > Naoya or Andi, can you chime in here? A new document doesn't make any sense. We have the commit messages and the code comments as design documents, and as usual the ultimative authority is what the code does. The guiding light for new memory recovery code is just these sentences (taken from the beginning of the main file): * In general any code for handling new cases should only be added iff: * - You know how to test it. * - You have a test that can be added to mce-test * https://git.kernel.org/cgit/utils/cpu/mce/mce-test.git/ * - The case actually shows up as a frequent (top 10) page state in * tools/vm/page-types when running a real workload. Since persistent memory is so big it makes sense to add support for it in common code paths. That is usually just kernel copies and user space execution. -Andi