On Thu, 2016-05-05 at 08:15 -0700, Dan Williams wrote: > On Thu, May 5, 2016 at 7:24 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> > wrote: > > > > On Mon, May 02, 2016 at 06:41:51PM +0300, Boaz Harrosh wrote: > > > > > > > > > > > All IO in a dax filesystem used to go through dax_do_io, which > > > > cannot > > > > handle media errors, and thus cannot provide a recovery path > > > > that can > > > > send a write through the driver to clear errors. > > > > > > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In > > > > the IO > > > > path for DAX filesystems, use the same direct_IO path for both > > > > DAX and > > > > direct_io iocbs, but use the flags to identify when we are in > > > > O_DIRECT > > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the > > > > conventional > > > > direct_IO path instead of DAX. > > > > > > > Really? What are your thinking here? > > > > > > What about all the current users of O_DIRECT, you have just made > > > them > > > 4 times slower and "less concurrent*" then "buffred io" users. > > > Since > > > direct_IO path will queue an IO request and all. > > > (And if it is not so slow then why do we need dax_do_io at all? > > > [Rhetorical]) > > > > > > I hate it that you overload the semantics of a known and expected > > > O_DIRECT flag, for special pmem quirks. This is an incompatible > > > and unrelated overload of the semantics of O_DIRECT. > > Agreed - makig O_DIRECT less direct than not having it is plain > > stupid, > > and I somehow missed this initially. > Of course I disagree because like Dave argues in the msync case we > should do the correct thing first and make it fast later, but also > like Dave this arguing in circles is getting tiresome. > > > > > This whole DAX story turns into a major nightmare, and I fear all > > our > > hodge podge tweaks to the semantics aren't helping it. > > > > It seems like we simply need an explicit O_DAX for the read/write > > bypass if can't sort out the semantics (error, writer > > synchronization) > > just as we need a special flag for MMAP. > I don't see how O_DAX makes this situation better if the goal is to > accelerate unmodified applications... > > Vishal, at least the "delete a file with a badblock" model will still > work for implicitly clearing errors with your changes to stop doing > block clearing in fs/dax.c. This combined with a new -EBADBLOCK (as > Dave suggests) and explicit logging of I/Os that fail for this reason > at least gives a chance to communicate errors in files to suitably > aware applications / environments. Agreed - I'll send out a series that has just the zeroing changes, and drop the dax_io fallback/O_DIRECT tweak for now while we figure out the right thing to do. That should get us to a place where we still have dax in the presence of errors, and have _a_ path for recovery. > _______________________________________________ > Linux-nvdimm mailing list > Linux-nvdimm@xxxxxxxxxxxx > https://lists.01.org/mailman/listinfo/linux-nvdimm��.n������g���;�a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������