Re: Accessing emulated CXL memory is unstable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 11, 2023 at 12:54 AM Gregory Price
<gregory.price@xxxxxxxxxxxx> wrote:
>
> On Tue, Oct 10, 2023 at 10:35:03AM +0900, Hyeonggon Yoo wrote:
> > Hello folks,
> >
> > I experienced strange application crashes/internal KVM errors
> > while playing with emulated type 3 CXL memory. I would like to know
> > if this is a real issue or I missed something during setup.
> >
> > TL;DR: applications crash when accessing emulated CXL memory,
> > and stressing VM subsystem causes KVM internal error
> > (stressing via stress-ng --bigheap)
> >
> ...
> >
> > Hmm... it crashed, and it's 'invalid opcode'.
> > Is this because the fetched instruction is different from what's
> > written to memory during exec()?
> >
>
> This is a known issue, and the working theory is 2 issues:

Okay, at least it's a known issue. Thank you for confirming that!

>
> 1) CXL devices are implemented on top of an MMIO-style dispatch system
>    and as a result memory from CXL is non-cacheable.  We think there
>    may be an issue with this in KVM but it hasn't been investigated
>    fully.
>
> 2) When we originally got CXL memory support, we discovered an edge case
>    where code pages hosted on CXL memory would cause a crash whenever an
>    instruction spanned across a page barrier.  A similar issue could
>    affect KVM.
>
> We haven't done much research into the problem beyond this.  For now, we
> all just turn KVM off while we continue development.

Thank you for summarizing the current state of the issue.
Hope it will be resolved! ;)

But I'm not sure if turning off KVM solves the problem.
`numactl --membind=1 --show` works fine, but other basic UNIX commands like ls
crashes QEMU when it's bind to the CXL NUMA node.

[root@localhost ~]# numactl --membind=1 --show
policy: bind
preferred node: 1
physcpubind: 0
cpubind: 0
nodebind: 0
membind: 1
[root@localhost ~]# numactl --membind=1 ls

qemu: fatal: cpu_io_recompile: could not find TB for pc=(nil)
RAX=0000777f80000000 RBX=0000000000000000 RCX=0000000000000028
RDX=0000000000000000
RSI=0000000000000354 RDI=0000000000000000 RBP=ffff88810628af40
RSP=ffffc900008cfd20
R8 =ffff88810628af40 R9 =ffffc900008cfcc4 R10=000000000000000d
R11=0000000000000000
R12=0000000390440000 R13=ffff888107a464c0 R14=0000000000000000
R15=ffff88810a49cd18
RIP=ffffffff810743e6 RFL=00000007 [-----PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 00000000 00000000
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffff88817bc00000 00000000 00000000
LDT=0000 0000000000000000 00000000 00008200 DPL=0 LDT
TR =0040 fffffe0000003000 00004087 00008900 DPL=0 TSS64-avl
GDT=     fffffe0000001000 0000007f
IDT=     fffffe0000000000 00000fff
CR0=80050033 CR2=00007fcb2504641c CR3=0000000390440000 CR4=007506f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000777f80000000 CCD=0000000390440000 CCO=ADDQ
EFER=0000000000000d01
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
YMM00=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM01=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM02=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM03=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM04=0000000000000000 0000000000000000 00006968705f6e6f 657800006c6c6577
YMM05=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM06=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM07=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM08=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM09=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM10=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM11=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM12=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM13=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM14=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM15=0000000000000000 0000000000000000 0000000000000000 0000000000000000
cxl2.sh: line 24:  5386 Aborted                 (core dumped) $QEMU
-cpu Cascadelake-Server -smp 1 -M q35,cxl=on -m 4G,maxmem=8G,slots=4
-object memory-backend-ram,id=vmem0,share=on,size=4G -device pxb-cc

--
Cheers,
Hyeonggon




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux