On Sun, Oct 30, 2022 at 05:31:43PM +0800, Shiyang Ruan wrote: > > > 在 2022/10/28 9:37, Dan Williams 写道: > > Darrick J. Wong wrote: > > > [add tytso to cc since he asked about "How do you actually /get/ fsdax > > > mode these days?" this morning] > > > > > > On Tue, Oct 25, 2022 at 10:56:19AM -0700, Darrick J. Wong wrote: > > > > On Tue, Oct 25, 2022 at 02:26:50PM +0000, ruansy.fnst@xxxxxxxxxxx wrote: > > ...skip... > > > > > > > > > Nope. Since the announcement of pmem as a product, I have had 15 > > > > minutes of acces to one preproduction prototype server with actual > > > > optane DIMMs in them. > > > > > > > > I have /never/ had access to real hardware to test any of this, so it's > > > > all configured via libvirt to simulate pmem in qemu: > > > > https://lore.kernel.org/linux-xfs/YzXsavOWMSuwTBEC@magnolia/ > > > > > > > > /run/mtrdisk/[gh].mem are both regular files on a tmpfs filesystem: > > > > > > > > $ grep mtrdisk /proc/mounts > > > > none /run/mtrdisk tmpfs rw,relatime,size=82894848k,inode64 0 0 > > > > > > > > $ ls -la /run/mtrdisk/[gh].mem > > > > -rw-r--r-- 1 libvirt-qemu kvm 10739515392 Oct 24 18:09 /run/mtrdisk/g.mem > > > > -rw-r--r-- 1 libvirt-qemu kvm 10739515392 Oct 24 19:28 /run/mtrdisk/h.mem > > > > > > Also forgot to mention that the VM with the fake pmem attached has a > > > script to do: > > > > > > ndctl create-namespace --mode fsdax --map dev -e namespace0.0 -f > > > ndctl create-namespace --mode fsdax --map dev -e namespace1.0 -f > > > > > > Every time the pmem device gets recreated, because apparently that's the > > > only way to get S_DAX mode nowadays? > > > > If you have noticed a change here it is due to VM configuration not > > anything in the driver. > > > > If you are interested there are two ways to get pmem declared the legacy > > way that predates any of the DAX work, the kernel calls it E820_PRAM, > > and the modern way by platform firmware tables like ACPI NFIT. The > > assumption with E820_PRAM is that it is dealing with battery backed > > NVDIMMs of small capacity. In that case the /dev/pmem device can support > > DAX operation by default because the necessary memory for the 'struct > > page' array for that memory is likely small. > > > > Platform firmware defined PMEM can be terabytes. So the driver does not > > enable DAX by default because the user needs to make policy choice about > > burning gigabytes of DRAM for that metadata, or placing it in PMEM which > > is abundant, but slower. So what I suspect might be happening is your > > configuration changed from something that auto-allocated the 'struct > > page' array, to something that needed those commands you list above to > > explicitly opt-in to reserving some PMEM capacity for the page metadata. > > I am using the same simulation environment as Darrick's and Dave's and have > tested many times, but still cannot reproduce the failed cases they > mentioned (dax+non_reflink mode, currently focuing) until now. Only a few > cases randomly failed because of "target is busy". But IIRC, those failed > cases you mentioned were failed with dmesg warning around the function > "dax_associate_entry()" or "dax_disassociate_entry()". Since I cannot > reproduce the failure, it hard for me to continue sovling the problem. FWIW things have calmed down as of 6.1-rc3 -- if I disable reflink, fstests runs without complaint. Now it only seems to be affecting reflink=1 filesystems. > And how is your recent test? Still failed with those dmesg warnings? If so, > could you zip the test result and send it to me? https://djwong.org/docs/kernel/daxbad.zip --D > > > -- > Thanks, > Ruan