Re: Advice sought on RCU stalls on ARM64 WSL2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 02, 2024 at 08:53:54PM +0100, Max Boone wrote:
> 
> Thank you so much for the quick reply!
> 
> ​​​​​​I haven't filed a bug with Debian specifically as I'm running the linux kernel built and provided by Microsoft and Ubuntu as OS on top. If it helps with the search I'd gladly run Debian and file a bug there, but will still need to build my own kernel as WSL requires some modules (such as HyperV storage and sockets) to be built into the kernel instead (meaning =y) of as modules (meaning =m).

Ah, if you built your own kernel, then you are your own distro as far
as kernel issues are concerned.  ;-)

							Thanx, Paul

> I'll stick to using the rcu list from here on to avoid spam, thanks again!
> ​​​​​
> On Saturday, March 02, 2024 20:43 CET, "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
>  [ Adding Boqun and the rcu list on CC. ]
> 
> On Sat, Mar 02, 2024 at 07:59:08PM +0100, Max Boone wrote:
> >
> > Dear Dr. McKenney,
> >
> > For a couple of years now I've been the sometimes frustrated owner of a Microsoft Surface Pro X ARM64 device, which has been getting progressively better as more vendors start targeting their builds at ARM64 architectures but since the introduction of the device there have been issues with the Windows Subsystem for Linux (not more than an opinionated Hyper-V VM with extensive tooling) locking up and hanging. 
> >
> > When this happens, traces like the following are dumped in the kernel messages:
> > https://github.com/microsoft/WSL/issues/9454#issuecomment-1942222109
> >
> > When watching your talk "Decoding Those Inscrutable RCU CPU Stall Warnings" you mentioned one can feel free reaching out when bumping into such issues. Building other kernel releases, switching off-and-on modules and playing with the RCU grace period times so far don't seem to work for me (or others in that thread).
> >
> > Anyways, I don't really know where to start looking and the call stacks aren't very informative (to my eye) either. I'm hoping you might help me find the direction to look for the root of this problem.
> 
> I am assuming that you have filed a bug with the Debian folks, and before
> doing that, searched for similar bug reports.
> 
> At first glance, this is because things were stuck here:
> 
> [ 967.115632] clear_rseq_cs.isra.0+0x4c/0x60
> [ 967.116433] do_notify_resume+0xf8/0xeb0
> [ 967.116960] el0_svc+0x3c/0x50
> [ 967.117537] el0t_64_sync_handler+0x9c/0x120
> [ 967.118323] el0t_64_sync+0x158/0x15c
> 
> So including these function names (clear_rseq_cs() and so on) in your
> search for similar bug reports would be a good idea.
> 
> I am unfamiliar with that code.
> 
> So I added Boqun because he works with Linux on HyperV as part of his
> day job and has a great deal of experience with RCU. He will likely
> have quite a number of questions for you including exact versions,
> Debian bug number, the results of your web search, and so on. He might
> also know an ARM person to get involved in this.
> 
> Or maybe he knows the solution off the top of his head!
> 
> Thanx, Paul
> 
> 
>  




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux