Re: [GIT PULL v2 09/20] KVM: s390: move pv gmap functions into kvm

David Hildenbrand <david@xxxxxxxxxx> · Wed, 12 Feb 2025 19:14:52 +0100

On 12.02.25 18:45, Claudio Imbrenda wrote:
On Wed, 12 Feb 2025 17:55:18 +0100
David Hildenbrand <david@xxxxxxxxxx> wrote:

On 31.01.25 12:24, Claudio Imbrenda wrote:
Move gmap related functions from kernel/uv into kvm.

Create a new file to collect gmap-related functions.

Reviewed-by: Janosch Frank <frankja@xxxxxxxxxxxxx>
Reviewed-by: Christoph Schlameuss <schlameuss@xxxxxxxxxxxxx>
[fixed unpack_one(), thanks mhartmay@xxxxxxxxxxxxx]
Link: https://lore.kernel.org/r/20250123144627.312456-6-imbrenda@xxxxxxxxxxxxx
Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
Message-ID: <20250123144627.312456-6-imbrenda@xxxxxxxxxxxxx>
---

This patch breaks large folio splitting because you end up un-refing
the wrong folios after a split; I tried to make it work, but either
because of other changes in this patch (or in others), I
cannot get it to work and have to give up for today.

yes, I had also noticed that and I already have a fix ready. In fact my
fix was exactly like yours, except that I did not pass the struct folio
anymore to kvm_s390_wiggle_split_folio(), but instead I only pass a
page and use page_folio() at the beginning, and I use
split_huge_page_to_list_to_order() directly instead of split_folio()

unfortunately the fix does not fix the issue I'm seeing....

but putting printks everywhere seems to solve the issue, so it seems to
be a race somewhere

It also doesn't work with a single vCPU for me. The VM is stuck in

With a two vCPUs (so one can report the lockup), I get:

[   62.645168] rcu: INFO: rcu_sched self-detected stall on CPU
[   62.645181] rcu:     0-....: (5999 ticks this GP) idle=0104/1/0x4000000000000002 softirq=2/2 fqs=2997
[   62.645186] rcu:     (t=6000 jiffies g=-1199 q=62 ncpus=2)
[   62.645191] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-427.33.1.el9_4.s390x #1
[   62.645194] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[   62.645195] Krnl PSW : 0704c00180000000 0000000024b3e776 (set_memory_decrypted+0x66/0xa0)
[   62.645206]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[   62.645208] Krnl GPRS: 00000000ca004000 0000037f00000001 000000008092f000 0000000000000000
[   62.645210]            0000037fffb1bbc0 0000000000000001 0000000025e75208 000000008092f000
[   62.645211]            0000000080873808 0000037fffb1bcd8 0000000000001000 0000000025e75220
[   62.645213]            0000000080281500 00000000258aa480 0000000024c0b17a 0000037fffb1bb20
[   62.645220] Krnl Code: 0000000024b3e76a: a784000f            brc     8,0000000024b3e788
[   62.645220]            0000000024b3e76e: a7210fff            tmll    %r2,4095
[   62.645220]           #0000000024b3e772: a7740017            brc     7,0000000024b3e7a0
[   62.645220]           >0000000024b3e776: b9a40034            uvc     %r3,%r4,0
[   62.645220]            0000000024b3e77a: b2220010            ipm     %r1
[   62.645220]            0000000024b3e77e: 8810001c            srl     %r1,28
[   62.645220]            0000000024b3e782: ec12fffa017e        cij     %r1,1,2,0000000024b3e776
[   62.645220]            0000000024b3e788: a72b1000            aghi    %r2,4096
[   62.645232] Call Trace:
[   62.645234]  [<0000000024b3e776>] set_memory_decrypted+0x66/0xa0
[   62.645238]  [<0000000024c0b17a>] dma_direct_alloc+0x16a/0x2d0
[   62.645242]  [<0000000024c09b92>] dma_alloc_attrs+0x62/0x80
[   62.645243]  [<000000002546c950>] cio_gp_dma_create+0x60/0xa0
[   62.645248]  [<0000000025ebb712>] css_bus_init+0x102/0x1b8
[   62.645252]  [<0000000025ebb7ea>] channel_subsystem_init+0x22/0xf8
[   62.645254]  [<0000000024b149ac>] do_one_initcall+0x3c/0x200
[   62.645256]  [<0000000025e777be>] do_initcalls+0x11e/0x148
[   62.645260]  [<0000000025e77a34>] kernel_init_freeable+0x1cc/0x208
[   62.645262]  [<00000000254ad01e>] kernel_init+0x2e/0x170
[   62.645264]  [<0000000024b16fdc>] __ret_from_fork+0x3c/0x60
[   62.645266]  [<00000000254bb07a>] ret_from_fork+0xa/0x40

The removed PTE lock would only explain it if we would have a concurrent GUP etc.
from QEMU I/O ? Not sure.

To fix the wrong refcount freezing, doing exactly what folio splitting does
(migration PTEs, locking the pagecache etc., freezing->converting,
removing migration ptes) should work, but requires a bit of work.

--
Cheers,

David / dhildenb