On Tue, Jun 22, 2021 at 07:24:56PM +0200, Oleksandr Natalenko wrote: > Hello. > > On úterý 22. června 2021 18:47:59 CEST Greg KH wrote: > > On Tue, Jun 22, 2021 at 06:30:46PM +0200, Oleksandr Natalenko wrote: > > > I'd like to nominate d583d360a6 ("psi: Fix psi state corruption when > > > schedule() races with cgroup move") for 5.12 stable tree. > > > > > > Recently, I've hit this: > > > > > > ``` > > > kernel: psi: inconsistent task state! task=2667:clementine cpu=21 > > > psi_flags=0 clear=1 set=0 > > > ``` > > > > > > and after that PSI IO went crazy high. That seems to match the symptoms > > > described in the commit message. > > > > But this says it fixes 4117cebf1a9f ("psi: Optimize task switch inside > > shared cgroups") which did not show up until 5.13-rc1, so how are you > > hitting this issue? > > I'm not positive 4117cebf1a9f was a root cause of the race. To me it looks > like 4117cebf1a9f just made an older issue more likely to be hit. > > Peter, Johannes, am I correct saying that it is still possible to hit a > corruption described in d583d360a6 on 5.12? I'm not aware of a previous issue, but it's possible you discovered one that was incidentally fixed by this change. That said, there haven't been many changes in this area prior to 5.12, and I stared at the old code quite a bit to see if there are other possible scenarios, so this gives me pause. > > Did you try this patch on 5.12.y and see that it solved your problem? > > Yes, I've built the kernel with this patch, and so far it runs fine. It can > take a while until the condition is hit though since it seems to be very > unlikely on 5.12. Is your task moving / being moved between cgroups while it's doing work? How long does it usually take to trigger it? Would it be possible to share a simpler reproducer, or is this part of a more complex application?