Re: Advice sought on RCU stalls on ARM64 WSL2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue Mar 5, 2024 at 12:32 AM UTC, Joel Fernandes wrote:
> FWIW, I use a Windows machine that has WSL2 (kernel version
> 5.15.133.1-microsoft-standard-WSL2) and I have never experienced any kind of
> hang. Though, this is a desktop and not a laptop or battery powered device.

Is that also an ARM64 machine, because I have never seen this happen on
a x86_64 machine, there it runs like a charm. Out of curiousity, if you are 
running an ARM64 Desktop. If I may as, which one, as the Volterra Development 
Kit is not available in the Netherlands.

> > 
> > It also happens when I build the kernel myself from a more recent
> > release:
> > - https://github.com/maxboone/SQ2-Linux-Kernel-Builds
> > 
> > Microsoft should have a Development Kit (Volterra) with identical hardware 
> > to mine (and other Surface Pro X, Surface Pro 9 users) that run into the 
> > same issue with WSL2.
>
> Right, so at least that's a data point, that its Surface-specific (?). Have you
> tried to disable power management and see if it occurs? Like disable suspend,
> disable cpuidle, etc.

It also happens on non-Surface (but indeed mobile) devices, such as
Lenovo ThinkPads. However, the common denominator might be the Qualcomm
8cx chip (that Microsoft uses as SQ{1,2,3} -> 8cx Gen{1,2,3} with a 
beefier GPU).

Changes to power management settings in Windows don't seem to have
effect other than stalls taking longer to occur when the device never
sleeps. But the stalls also happen (often) when it doesn't sleep.

Power management in WSL2 seems to be all but available:

```
root@ProX2024:~# uname -r
6.7.7-WSL2-STABLE+
root@ProX2024:~# echo freeze > /sys/power/state
-bash: echo: write error: Function not implemented
root@ProX2024:~# ls /sys/devices/system/cpu/
cpu0  cpu2  cpu4  cpu6  cpufreq   kernel_max  offline  possible  present  vulnerabilities
cpu1  cpu3  cpu5  cpu7  isolated  modalias    online   power     uevent
```

However available in Hyper-V:

```
root@ubuntu0:~# uname -r
6.5.0-21-generic
root@ubuntu0:~# echo freeze > /sys/power/state
root@ubuntu0:~# ls /sys/devices/system/cpu
cpu0  cpu2  cpufreq  hotplug   kernel_max  offline  possible  present  uevent
cpu1  cpu3  cpuidle  isolated  modalias    online   power     smt      vulnerabilities
```

> Have you tried to reproduce the issue with CONFIG_RSEQ=n and see if it happens?

Will build a new kernel today with that flag, and report back.

> Also this github thread looks awfully similar to the github thread you pointed
> and has the same clear_rseq signature leading to the RCU stall. Over there also
> it is a hang, but they say the CPU usage is at 100%:
> https://github.com/microsoft/WSL/issues/8529

Indeed, when the RCU stalls occur, the CPU of the core that is stalling
ramps up to 100%. I had thought that was an effect of the stall, but
will check if the 100% usage is caused by the process that is stalling.

Cheers,
Max.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux