Re: [PATCH] dax: remove the pmem_dax_ops->flush abstraction

Dan Williams <dan.j.williams@xxxxxxxxx> · Wed, 6 Sep 2017 07:22:27 -0700

On Tue, Sep 5, 2017 at 11:29 PM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:

On Thu, 31 Aug 2017, Dan Williams wrote:

> On Thu, Aug 31, 2017 at 8:04 PM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:

> >

> >

> > On Thu, 31 Aug 2017, Dan Williams wrote:

> >

> >> On Thu, Aug 31, 2017 at 6:47 PM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:

> >> > The patch abebfbe2f731 ("dm: add ->flush() dax operation support") is

> >> > buggy. A dm device may be composed of multiple underlying devices and all

> >> > of them need to be flushed. The patch just routes the flush request to the

> >> > first device and ignores the other devices.

> >> >

> >> > It could be fixed by adding more complex logic to the device mapper. But

> >> > there is only one implementation of the method pmem_dax_ops->flush - that

> >> > is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we

> >> > don't need the pmem_dax_ops->flush abstraction at all, we can call

> >> > arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush

> >> > couldn't ever reach anything different from arch_wb_cache_pmem().

> >>

> >> Unfortunately, this is not true, see usage of DAXDEV_WRITE_CACHE. This

> >> is for platforms that arrange for cpu caches to be flushed on

> >> power-fail, like standalone storage appliances, where it would be a

> >

> > BTW. if the system is capable of flushing caches on power failure, it is

> > system-wide feature, it is not per-device feature. The CPU caches may

> > store data for any persistent memory device.

> >

> > You either have enough remaining power to flush the CPU caches or not.

>

> Whether you need to flush caches or not can be a per-address range

> attribute and can change over time.

There is only one way to flush the cache (the clwb instruction), you don't

need a method that abstracts over this instruction. A simple flag or

static key that skips clwb() is sufficient.

Linux kernel API is changeable, so you don't need to overdesign the API

for future. If it ever happens that we have a system with two types of

persistent memory that need different instructions to flush them, we can

design a proper abstraction. But there is no reason to design the

abstraction in advance for a situation that will perhaps never happen.

True, it is over designed and we can cross that complexity bridge in the future when / if it arises.

Since dm is already checking dax_write_cache_enabled() to set its own flush enabled bit I think your patch looks good. You can add:

Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel