On Thu, Aug 31, 2017 at 7:49 PM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > On Thu, 31 Aug 2017, Dan Williams wrote: > >> On Thu, Aug 31, 2017 at 6:47 PM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: >> > The patch abebfbe2f731 ("dm: add ->flush() dax operation support") is >> > buggy. A dm device may be composed of multiple underlying devices and all >> > of them need to be flushed. The patch just routes the flush request to the >> > first device and ignores the other devices. >> > >> > It could be fixed by adding more complex logic to the device mapper. But >> > there is only one implementation of the method pmem_dax_ops->flush - that >> > is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we >> > don't need the pmem_dax_ops->flush abstraction at all, we can call >> > arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush >> > couldn't ever reach anything different from arch_wb_cache_pmem(). >> >> Unfortunately, this is not true, see usage of DAXDEV_WRITE_CACHE. This >> is for platforms that arrange for cpu caches to be flushed on >> power-fail, like standalone storage appliances, where it would be a >> waste for the kernel to track and flush dirty cachelines for dax. >> Theoretically this could be done on a per-address range basis (think >> battery backing a subset of the system memory). I think we need to fix >> the routing to flush the same dax_device that ->direct_access() was >> invoked. > > With my patch, dax_flush checks DAXDEV_WRITE_CACHE and doesn't do anything > if it's not set - it just doesn't go through device mapper. True, but there's no guarantee that arch_wb_pmem() will be sufficient going forward and the reason we routed this to the driver in the first place was to allow driver / platform specific flush implementations. > >> > It should be also pointed out that for some uses of persistent memory it >> > is needed to flush only very small amount of data (such as 1 cacheline), >> > and it would be overkill if we go through that device mapper machinery for >> > a single flushed cache line. >> >> I think this is more an argument to not enable DAX on that >> device-mapper topology if operation routing impacts performance. DAX >> is meant to get the kernel out of the way most of the time. > > I don't understand what do you mean? I am writing a device mapper target > that uses persistent memory as a cache. When flushing the metadata, > sometimes it is needed to flush single cache lines. Ideally, I would use > the clwb instruction for this purpose, but that clwb is hidden behind a > complicated pmem_dax_ops->flush abstraction (and that abstraction will get > even more complicated if implemented correctly). How do you think I should > write that target without DAX? I don't think you want DAX for this, I think you want direct control of a persistent-memory address range. For example the brd driver supports DAX but I wouldn't recommend using it as the basis for a directly managed cache. So I think we need to dive deeper into what an in kernel interface to access a contiguous persistent memory range looks like. DAX was not designed with that use case in mind. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel