Gregory Price wrote: > On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote: > > Summary: > > -------- > > > > CXL RAM support allows for the dynamic provisioning of new CXL RAM > > regions, and more routinely, assembling a region from an existing > > configuration established by platform-firmware. The latter is motivated > > by CXL memory RAS (Reliability, Availability and Serviceability) > > support, that requires associating device events with System Physical > > Address ranges and vice versa. > > > > Ok, I simplified down my tests and reverted a bunch of stuff, figured i > should report this before I dive further in. > > Earlier i was carrying the DOE patches and others, I've dropped most of > that to make sure i could replicate on the base kernel and qemu images > > QEMU branch: > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26 > this is a little out of date at this point i think? but it shouldn't > matter, the results are the same regardless of what else i pull in. > > Kernel branch: > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region Note that I acted on this feedback from Greg to break out a fix and merge it for v6.2-final http://lore.kernel.org/r/Y+CSOeHVLKudN0A6@xxxxxxxxx ...i.e. you are missing at least the passthrough decoder fix, but that would show up as a region creation failure not a QEMU crash. So I would move to testing cxl/next. [..] > Lets attempt to use the memory > [root@fedora ~]# numactl --membind=1 python > KVM internal error. Suberror: 3 > extra data[0]: 0x0000000080000b0e > extra data[1]: 0x0000000000000031 > extra data[2]: 0x0000000000000d81 > extra data[3]: 0x0000000390074ac0 > extra data[4]: 0x0000000000000010 > RAX=0000000080000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000001 > RSI=0000000000000000 RDI=0000000390074000 RBP=ffffac1c4067bca0 RSP=ffffac1c4067bc88 > R8 =0000000000000000 R9 =0000000000000001 R10=0000000000000000 R11=0000000000000000 > R12=0000000000000000 R13=ffff99eed0074000 R14=0000000000000000 R15=0000000000000000 > RIP=ffffffff812b3d62 RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 0000000000000000 ffffffff 00c00000 > CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] > SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] > DS =0000 0000000000000000 ffffffff 00c00000 > FS =0000 0000000000000000 ffffffff 00c00000 > GS =0000 ffff99ec3bc00000 ffffffff 00c00000 > LDT=0000 0000000000000000 ffffffff 00c00000 > TR =0040 fffffe1d13135000 00004087 00008b00 DPL=0 TSS64-busy > GDT= fffffe1d13133000 0000007f > IDT= fffffe0000000000 00000fff > CR0=80050033 CR2=ffffffff812b3d62 CR3=0000000390074000 CR4=000006f0 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 > DR6=00000000fffe0ff0 DR7=0000000000000400 > EFER=0000000000000d01 > Code=5d 9c 01 0f b7 db 48 09 df 48 0f ba ef 3f 0f 22 df 0f 1f 00 <5b> 41 5c 41 5d 5d c3 cc cc cc cc 48 c7 c0 00 00 00 80 48 2b 05 cd 0d 76 01 > At first glance that looks like a QEMU issue, but I would capture a cxl list -vvv before attempting to use the memory just to verify the decoder setup looks sane. > > > I also tested lowering the ram sizes (2GB ram, 1GB "CXL") to see if > there's something going on with the PCI hole or something, but no, same > results. > > Double checked if there was an issue using a single root port so i > registered a second one - same results. > > > In prior tests i accessed the memory directly via devmem2 > > This still works when mapping the memory manually > > [root@fedora map] ./map_memory.sh > echo ram > /sys/bus/cxl/devices/decoder2.0/mode > echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size > echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region > echo 4096 > /sys/bus/cxl/devices/region0/interleave_granularity > echo 1 > /sys/bus/cxl/devices/region0/interleave_ways > echo 0x40000000 > /sys/bus/cxl/devices/region0/size > echo decoder2.0 > /sys/bus/cxl/devices/region0/target0 > echo 1 > /sys/bus/cxl/devices/region0/commit > > > [root@fedora devmem]# ./devmem2 0x290000000 w 0x12345678 > /dev/mem opened. > Memory mapped at address 0x7fb4d4ed3000. > Value at address 0x290000000 (0x7fb4d4ed3000): 0x0 > Written 0x12345678; readback 0x12345678 Likely it is sensitive to crossing an interleave threshold. > This kind of implies there's a disagreement about the state of memory > between linux and qemu. > > > but even just onlining a region produces memory usage: > > [root@fedora ~]# cat /sys/bus/node/devices/node1/meminfo > Node 1 MemTotal: 1048576 kB > Node 1 MemFree: 1048112 kB > Node 1 MemUsed: 464 kB > > > Which I would expect to set off some fireworks. > > Maybe an issue at the NUMA level? I just... i have no idea. > > > I will need to dig through the email chains to figure out what others > have been doing that i'm missing. Everything *looks* nominal, but the > reactors are exploding so... ¯\_(ツ)_/¯ > > I'm not sure where to start here, but i'll bash my face on the keyboard > for a bit until i have some ideas. Not ruling out the driver yet, but Fan's tests with hardware has me leaning more towards QEMU.