在 2022/11/2 8:45, Darrick J. Wong 写道: > On Sun, Oct 30, 2022 at 05:31:43PM +0800, Shiyang Ruan wrote: >> >> >> 在 2022/10/28 9:37, Dan Williams 写道: >>> Darrick J. Wong wrote: >>>> [add tytso to cc since he asked about "How do you actually /get/ fsdax >>>> mode these days?" this morning] >>>> >>>> On Tue, Oct 25, 2022 at 10:56:19AM -0700, Darrick J. Wong wrote: >>>>> On Tue, Oct 25, 2022 at 02:26:50PM +0000, ruansy.fnst@xxxxxxxxxxx wrote: >> >> ...skip... >> >>>>> >>>>> Nope. Since the announcement of pmem as a product, I have had 15 >>>>> minutes of acces to one preproduction prototype server with actual >>>>> optane DIMMs in them. >>>>> >>>>> I have /never/ had access to real hardware to test any of this, so it's >>>>> all configured via libvirt to simulate pmem in qemu: >>>>> https://lore.kernel.org/linux-xfs/YzXsavOWMSuwTBEC@magnolia/ >>>>> >>>>> /run/mtrdisk/[gh].mem are both regular files on a tmpfs filesystem: >>>>> >>>>> $ grep mtrdisk /proc/mounts >>>>> none /run/mtrdisk tmpfs rw,relatime,size=82894848k,inode64 0 0 >>>>> >>>>> $ ls -la /run/mtrdisk/[gh].mem >>>>> -rw-r--r-- 1 libvirt-qemu kvm 10739515392 Oct 24 18:09 /run/mtrdisk/g.mem >>>>> -rw-r--r-- 1 libvirt-qemu kvm 10739515392 Oct 24 19:28 /run/mtrdisk/h.mem >>>> >>>> Also forgot to mention that the VM with the fake pmem attached has a >>>> script to do: >>>> >>>> ndctl create-namespace --mode fsdax --map dev -e namespace0.0 -f >>>> ndctl create-namespace --mode fsdax --map dev -e namespace1.0 -f >>>> >>>> Every time the pmem device gets recreated, because apparently that's the >>>> only way to get S_DAX mode nowadays? >>> >>> If you have noticed a change here it is due to VM configuration not >>> anything in the driver. >>> >>> If you are interested there are two ways to get pmem declared the legacy >>> way that predates any of the DAX work, the kernel calls it E820_PRAM, >>> and the modern way by platform firmware tables like ACPI NFIT. The >>> assumption with E820_PRAM is that it is dealing with battery backed >>> NVDIMMs of small capacity. In that case the /dev/pmem device can support >>> DAX operation by default because the necessary memory for the 'struct >>> page' array for that memory is likely small. >>> >>> Platform firmware defined PMEM can be terabytes. So the driver does not >>> enable DAX by default because the user needs to make policy choice about >>> burning gigabytes of DRAM for that metadata, or placing it in PMEM which >>> is abundant, but slower. So what I suspect might be happening is your >>> configuration changed from something that auto-allocated the 'struct >>> page' array, to something that needed those commands you list above to >>> explicitly opt-in to reserving some PMEM capacity for the page metadata. >> >> I am using the same simulation environment as Darrick's and Dave's and have >> tested many times, but still cannot reproduce the failed cases they >> mentioned (dax+non_reflink mode, currently focuing) until now. Only a few >> cases randomly failed because of "target is busy". But IIRC, those failed >> cases you mentioned were failed with dmesg warning around the function >> "dax_associate_entry()" or "dax_disassociate_entry()". Since I cannot >> reproduce the failure, it hard for me to continue sovling the problem. > > FWIW things have calmed down as of 6.1-rc3 -- if I disable reflink, > fstests runs without complaint. Now it only seems to be affecting > reflink=1 filesystems. > >> And how is your recent test? Still failed with those dmesg warnings? If so, >> could you zip the test result and send it to me? > > https://djwong.org/docs/kernel/daxbad.zip Thanks for your info! (To Dave) I need your recent test result too. If cases won't fail when reflink disabled, I'll focusing on solving the warning when reflink enabled. -- Thanks, Ruan. > > --D > >> >> >> -- >> Thanks, >> Ruan