Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default

Gregory Price <gregory.price@xxxxxxxxxxxx> · Tue, 14 Feb 2023 16:51:53 -0500

On Tue, Feb 14, 2023 at 09:18:24PM +0000, Jonathan Cameron wrote:
> On Tue, 14 Feb 2023 14:01:23 -0500
> Gregory Price <gregory.price@xxxxxxxxxxxx> wrote:
> 
> > On Tue, Feb 14, 2023 at 10:39:42AM -0800, Dan Williams wrote:
> > > Gregory Price wrote:  
> > > > On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:  
> > > > > Summary:
> > > > > --------
> > > > > 
> > > > > CXL RAM support allows for the dynamic provisioning of new CXL RAM
> > > > > regions, and more routinely, assembling a region from an existing
> > > > > configuration established by platform-firmware. The latter is motivated
> > > > > by CXL memory RAS (Reliability, Availability and Serviceability)
> > > > > support, that requires associating device events with System Physical
> > > > > Address ranges and vice versa.
> > > > >   
> > > > 
> > > > Ok, I simplified down my tests and reverted a bunch of stuff, figured i
> > > > should report this before I dive further in.
> > > > 
> > > > Earlier i was carrying the DOE patches and others, I've dropped most of
> > > > that to make sure i could replicate on the base kernel and qemu images
> > > > 
> > > > QEMU branch: 
> > > > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-26
> > > > this is a little out of date at this point i think? but it shouldn't
> > > > matter, the results are the same regardless of what else i pull in.
> > > > 
> > > > Kernel branch:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region  
> > > 
> > > Note that I acted on this feedback from Greg to break out a fix and
> > > merge it for v6.2-final
> > > 
> > > http://lore.kernel.org/r/Y+CSOeHVLKudN0A6@xxxxxxxxx
> > > 
> > > ...i.e. you are missing at least the passthrough decoder fix, but that
> > > would show up as a region creation failure not a QEMU crash.
> > > 
> > > So I would move to testing cxl/next.
> > >   
> > 
> > I just noticed this, already spinning a new kernel.  Will report back
> > 
> > > Not ruling out the driver yet, but Fan's tests with hardware has me
> > > leaning more towards QEMU.  
> > 
> > Same, not much has changed and I haven't tested with hardware yet. Was
> > planning to install it on our local boxes sometime later this week.
> > 
> > Was just so close to setting up a virtual memory pool in the lab, was
> > getting antsy :]
> 
> Could you test it with TCG (just drop --enable-kvm)?  We have a known
> limitation with x86 instructions running out of CXL emulated memory
> (side effect of emulating the interleave).  You'll need a fix even on TCG
> for the corner case of an instruction bridging from normal ram to cxl memory.
> https://lore.kernel.org/qemu-devel/20230206193809.1153124-1-richard.henderson@xxxxxxxxxx/
> 
> Performance will be bad, but so far this is only way we can do it correctly.
> 
> Jonathan
> 

Siiiggghh... i had this patch and dropped --enable-kvm, but forgot to
drop "accel=kvm" from the -machine line

This was the issue.

And let me tell you, if you numactl --membind=1 python, it is
IMPRESSIVELY slow.  I wonder if it's even hitting a few 100k
instructions a second.

This appears to be the issue.  When I get a bit more time, try to dive
into the deep dark depths of qemu memory regions to see how difficult
a non-mmio fork might be, unless someone else is already looking at it.

~Gregory