On Fri, Jan 29, 2021 at 8:19 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 29.01.21 03:06, Pavel Tatashin wrote: > >>> Might be related to the broken custom pfn_valid() implementation for > >>> ZONE_DEVICE. > >>> > >>> https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khandual@xxxxxxx > >>> > >>> And essentially ignoring sub-section data in there for now as well (but > >>> might not be that relevant yet). In addition, this might also be related to > >>> > >>> https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > >> > >> I will check it, and see what I find. I saw that panic almost a year > >> ago, things might have changed since then. > > > > Hi David, > > > > There is no panic anymore, but I also can't offset by 2M anymore, the > > minimum that works now is 16M, and if alignment is less than 16M > > creating devdax device fails. > > I wonder why we get such different namespace sizes? Where do the > differences come from? This looks very weird. > > > > > So, I tried the new ARM64 patch that reduces section sizes, and two > > alignments for pmem: regular 2G alignment, and 2G+16M alignment. > > (subtracted 16M from the bottom) > > > > ***** 4K page, 6G RAM, 2G PRAM ***** > > BOOT: > > 40000000-1bfffffff : System RAM > > 1c0000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1bfffffff : System RAM > > 1c0000000-1c21fffff : namespace0.0 > > 1c2200000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1bfffffff : System RAM > > 1c0000000-1c21fffff : namespace0.0 > > 1c8000000-23fffffff : dax0.0 > > 1c8000000-23fffffff : System RAM (kmem) 128M Wasted (Expected) > > The namespace spans 34MB?? > > > > > ***** 4K page, 6G-16M RAM, 2G+16M PRAM ***** > > BOOT: > > 40000000-1beffffff : System RAM > > 1bf000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1beffffff : System RAM > > 1bf000000-1c11fffff : namespace0.0 > > 1c1200000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1beffffff : System RAM > > 1bf000000-1c11fffff : namespace0.0 > > 1c8000000-23fffffff : dax0.0 > > 1c8000000-23fffffff : System RAM (kmem) 144M Wasted (????) > > The namespace spans 34MB?? Right, this seems like a bug > > > > > ***** 64K page, 6G RAM, 2G PRAM ***** > > BOOT: > > 40000000-1bfffffff : System RAM > > 1c0000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1bfffffff : System RAM > > 1c0000000-1dfffffff : namespace0.0 > > 1e0000000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1bfffffff : System RAM > > 1c0000000-1dfffffff : namespace0.0 > > The namespace spans 512MB ?!? What? This is because section size is 512M with 64K pages. > > > 1e0000000-23fffffff : dax0.0 > > 1e0000000-23fffffff : System RAM (kmem) 512M Wasted (Expected) > > > > ***** 64K page, 6G-16M RAM, 2G+16M PRAM ***** > > BOOT: > > 40000000-1beffffff : System RAM > > 1bf000000-23fffffff : namespace0.0 > > DEVDAX: > > 40000000-1beffffff : System RAM > > 1bf000000-1bf3fffff : namespace0.0 > > 1bf400000-23fffffff : dax0.0 > > HOTPLUG: > > 40000000-1beffffff : System RAM > > 1bf000000-1bf3fffff : namespace0.0 > > The namespace now consumes 4MB ?!? > > > 1c0000000-23fffffff : dax0.0 > > 1c0000000-23fffffff : System RAM (kmem) 16M Wasted (Optimal) > > Good :) I guess more optimal would be 2MB/0MB :) Agree, but for the offset 16M this is optimal, because 16M is smaller than section size. > > > > > In all three cases only System RAM, namespace0.0, and dax0.0 were > > printed from /proc/iomem. > > BOOT content of iomem right after boot > > DEVDAX content of iomem after devdax is created > > ndctl create-namespace --mode devdax -e namespace0.0" > > HOTPLUG content of imem after dax0.0 is hotplugged: > > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind > > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id > > > > > > The most surprising part is why with 4K pages and 16M offset 144M is > > wasted? For whatever reason, when devdax is created 34 goes wasted to > > the label? Something is wrong here.. However, I am happy with 64K > > pages result, and that only 16M is wasted, of course optimally, we > > should be using any memory here, but it is still much better than what > > we have now. > > Definitely, but we should try figuring out what's going on here. I > assume on x86-64 it behaves differently? Yes, we should root cause. I highly suspect that there is somewhere alignment miscalculations happen that cause this memory waste with the offset 16M. I am also not sure why the 2M label size was increased, and why 16M is now an alignment requirement. I tested on x86, and got pretty much the same results as on ARM64: 2M offset is not allowed anymore 16M minimum, and even with 16M offset, 144M is wasted. Here is full QEMU command if anyone wants to repro it: KERNEL_PARAM='console=ttyS0 ip=dhcp' KERNEL_PARAM+=' memmap=2G!8G' #KERNEL_PARAM+=' memmap=2064M!8176M' qemu-system-x86_64 \ -m 8G -smp 1 \ -machine q35 \ -nographic \ -enable-kvm \ -kernel pmem/native/arch/x86/boot/bzImage \ -initrd ../poky/build/tmp/deploy/images/qemux86-64/core-image-minimal-qemux86-64.cpio.gz \ -chardev stdio,id=console,signal=off,mux=on \ -mon chardev=console \ -serial chardev:console \ -netdev user,hostfwd=tcp::5000-:22,id=netdev0 \ -device virtio-net-pci,netdev=netdev0 \ -append "$KERNEL_PARAM" Also, I am using current master branch tip for ndctl command: root@qemux86-64:~# ndctl --version 71.2.gea014c0 ***** 4K page, 6G RAM, 2G PRAM: kernel parameter memmap=2G!8G ***** BOOT: 100000000-1ffffffff : System RAM 200000000-27fffffff : Persistent Memory (legacy) 200000000-27fffffff : namespace0.0 DEVDAX: 100000000-1ffffffff : System RAM 200000000-27fffffff : Persistent Memory (legacy) 200000000-2021fffff : namespace0.0 202200000-27fffffff : dax0.0 HOTPLUG: 100000000-1ffffffff : System RAM 200000000-27fffffff : Persistent Memory (legacy) 200000000-2021fffff : namespace0.0 208000000-27fffffff : dax0.0 208000000-27fffffff : System RAM (kmem) (128M Wasted) ***** 4K page, 6G-16M RAM, 2G+16M PRAM: kernel parameter memmap=2064M!8176M ***** BOOT: 100000000-1feffffff : System RAM 1ff000000-27fffffff : Persistent Memory (legacy) 1ff000000-27fffffff : namespace0.0 DEVDAX: 100000000-1feffffff : System RAM 1ff000000-27fffffff : Persistent Memory (legacy) 1ff000000-2011fffff : namespace0.0 201200000-27fffffff : dax0.0 HOTPLUG: 100000000-1feffffff : System RAM 1ff000000-27fffffff : Persistent Memory (legacy) 1ff000000-2011fffff : namespace0.0 208000000-27fffffff : dax0.0 208000000-27fffffff : System RAM (kmem) (144M Wasted) The least amount of wasted memory I can get on x86 with this experiment is with offset that is larger than 34M, and 16M aligned: 48M: memmap=2096M!8144M root@qemux86-64:~# cat /proc/iomem | grep 'dax\|namespace\|System\|Pers' 100000000-1fcffffff : System RAM 1fd000000-27fffffff : Persistent Memory (legacy) 1fd000000-1ff1fffff : namespace0.0 200000000-27fffffff : dax0.0 200000000-27fffffff : System RAM (kmem) (48M Wasted) Pasha > > Thanks > > > -- > Thanks, > > David / dhildenb >