On Tue, Feb 14, 2023 at 09:18:24PM +0000, Jonathan Cameron wrote: > On Tue, 14 Feb 2023 14:01:23 -0500 > Gregory Price <gregory.price@xxxxxxxxxxxx> wrote: > > > On Tue, Feb 14, 2023 at 10:39:42AM -0800, Dan Williams wrote: > > > Gregory Price wrote: > > > > On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote: > > > > > Summary: > > > > > -------- > > > > > > > > > > CXL RAM support allows for the dynamic provisioning of new CXL RAM > > > > > regions, and more routinely, assembling a region from an existing > > > > > configuration established by platform-firmware. The latter is motivated > > > > > by CXL memory RAS (Reliability, Availability and Serviceability) > > > > > support, that requires associating device events with System Physical > > > > > Address ranges and vice versa. > > > > > > > > > > > > > Ok, I simplified down my tests and reverted a bunch of stuff, figured i > > > > should report this before I dive further in. > > > > > > > > Earlier i was carrying the DOE patches and others, I've dropped most of > > > > that to make sure i could replicate on the base kernel and qemu images > > > > > > > > QEMU branch: > > > > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26 > > > > this is a little out of date at this point i think? but it shouldn't > > > > matter, the results are the same regardless of what else i pull in. > > > > > > > > Kernel branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region > > > > > > Note that I acted on this feedback from Greg to break out a fix and > > > merge it for v6.2-final > > > > > > http://lore.kernel.org/r/Y+CSOeHVLKudN0A6@xxxxxxxxx > > > > > > ...i.e. you are missing at least the passthrough decoder fix, but that > > > would show up as a region creation failure not a QEMU crash. > > > > > > So I would move to testing cxl/next. > > > > > > > I just noticed this, already spinning a new kernel. Will report back > > > > > Not ruling out the driver yet, but Fan's tests with hardware has me > > > leaning more towards QEMU. > > > > Same, not much has changed and I haven't tested with hardware yet. Was > > planning to install it on our local boxes sometime later this week. > > > > Was just so close to setting up a virtual memory pool in the lab, was > > getting antsy :] > > Could you test it with TCG (just drop --enable-kvm)? We have a known > limitation with x86 instructions running out of CXL emulated memory > (side effect of emulating the interleave). You'll need a fix even on TCG > for the corner case of an instruction bridging from normal ram to cxl memory. > https://lore.kernel.org/qemu-devel/20230206193809.1153124-1-richard.henderson@xxxxxxxxxx/ > > Performance will be bad, but so far this is only way we can do it correctly. > > Jonathan > Siiiggghh... i had this patch and dropped --enable-kvm, but forgot to drop "accel=kvm" from the -machine line This was the issue. And let me tell you, if you numactl --membind=1 python, it is IMPRESSIVELY slow. I wonder if it's even hitting a few 100k instructions a second. This appears to be the issue. When I get a bit more time, try to dive into the deep dark depths of qemu memory regions to see how difficult a non-mmio fork might be, unless someone else is already looking at it. ~Gregory