Re: [PATCH v1 0/5] target/arm: Handle psci calls in userspace

Gavin Shan <gshan@xxxxxxxxxx> · Thu, 13 Jul 2023 10:27:57 +1000

Hi Salil,

On 7/4/23 19:58, Salil Mehta wrote:


Latest Qemu Prototype (Pre RFC V2) (Not in the final shape of the patches)
https://github.com/salil-mehta/qemu.git ;  virt-cpuhp-armv8/rfc-v1-port11052023.dev-1


should work against below kernel changes as confirmed by James,

Latest Kernel Prototype (Pre RFC V2 = RFC V1 + Fixes)
https://git.gitlab.arm.com/linux-arm/linux-jm.git ;  virtual_cpu_hotplug/rfc/v2


I think it'd better to have the discussions through maillist. The threads and all
follow-up replies can be cached somewhere to avoid lost. Besides, other people may
be intrested in the same points and can join the discussion directly.

I got a chance to give the RFC patchsets some tests. Not all cases are working
as expected. I know the patchset is being polished. I'm summarize them as below:

(1) coredump is triggered when the topology is out of range. It's the issue we
    discussed in private. Here I'm just recapping in case other people also blocked
    by the issue.

    (a) start VM with the following command lines
     /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64       \
     -accel kvm -machine virt,gic-version=host,nvdimm=on -cpu host \
     -smp cpus=1,maxcpus=2,sockets=1,clusters=1,cores=1,threads=2  \
     -m 512M,slots=16,maxmem=64G                                   \
     -object memory-backend-ram,id=mem0,size=512M                  \
     -numa node,nodeid=0,cpus=0-1,memdev=mem0                      \

    (b) hot add CPU whose topology is out of range
    (qemu) device_add driver=host-arm-cpu,id=cpu1,core-id=1


    It's actually caused by typos in hw/arm/virt.c::virt_cpu_pre_plug() where
    'ms->possible_cpus->len' needs to be replaced with 'ms->smp.cores'. With this,
    the hot-added CPU object will be rejected.

(2) I don't think TCG has been tested since it seems not working at all.

    (a) start VM with the following command lines
    /home/gshan/sandbox/src/qemu/main/build/qemu-system-aarch64     \
    -machine virt,gic-version=3 -cpu max -m 1024                    \
    -smp maxcpus=2,cpus=1,sockets=1,clusters=1,cores=1,threads=2    \

    (b) failure while hot-adding CPU
    (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1
    Error: cpu(id1=0:0:0:1) with arch-id 1 exists

    The error message is printed by hw/arm/virt.c::virt_cpu_pre_plug() where the
    specific CPU has been presented. For KVM case, the disabled CPUs are detached
    from 'ms->possible_cpu->cpus[1].cpu' and destroyed. I think we need to do similar
    thing for TCG case in hw/arm/virt.c::virt_cpu_post_init(). I'm able to add CPU
    with the following hunk of changes.

--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2122,6 +2122,18 @@ static void virt_cpu_post_init(VirtMachineState *vms, MemoryRegion *sysmem)
                 exit(1);
             }
         }
+
+#if 1
+        for (n = 0; n < possible_cpus->len; n++) {
+            cpu = qemu_get_possible_cpu(n);
+            if (!qemu_enabled_cpu(cpu)) {
+                CPUArchId *cpu_slot;
+                cpu_slot = virt_find_cpu_slot(ms, cpu->cpu_index);
+                cpu_slot->cpu = NULL;
+                object_unref(OBJECT(cpu));
+            }
+        }
+#endif
     }
 }

(3) Assertion on following the sequence of hot-add, hot-remove and hot-add when TCG mode is enabled.

    (a) Include the hack from (2) and start VM with the following command lines
    /home/gshan/sandbox/src/qemu/main/build/qemu-system-aarch64     \
    -machine virt,gic-version=3 -cpu max -m 1024                    \
    -smp maxcpus=2,cpus=1,sockets=1,clusters=1,cores=1,threads=2    \

    (b) assertion on the sequence of hot-add, hot-remove and hot-add
    (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1
    (qemu) device_del cpu1
    (qemu) device_add driver=max-arm-cpu,id=cpu1,thread-id=1
    **
    ERROR:../tcg/tcg.c:669:tcg_register_thread: assertion failed: (n < tcg_max_ctxs)
    Bail out! ERROR:../tcg/tcg.c:669:tcg_register_thread: assertion failed: (n < tcg_max_ctxs)
    Aborted (core dumped)

    I'm not sure if x86 has similar issue. It seems the management for TCG contexts, corresponding
    to variable @tcg_max_ctxs and @tcg_ctxs need some improvements for better TCG context registration
    and unregistration to accomodate CPU hotplug.


Apart from what have been found in the tests, I've started to look into the code changes. I may
reply with more specific comments. However, it would be ideal to comment on the specific changes
after the patchset is posted for review. Salil, the plan may have been mentioned by you somewhere.
As I understood, the QEMU patchset will be posted after James's RFCv2 kernel series is posted.
Please let me know if my understanding is correct. Again, thanks for your efforts to make vCPU
hotplug to be supported :)

Thanks,
Gavin