Re: Backport d583d360a6 into 5.12 stable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello.

On úterý 22. června 2021 20:27:51 CEST Johannes Weiner wrote:
> On Tue, Jun 22, 2021 at 07:24:56PM +0200, Oleksandr Natalenko wrote:
> > On úterý 22. června 2021 18:47:59 CEST Greg KH wrote:
> > > On Tue, Jun 22, 2021 at 06:30:46PM +0200, Oleksandr Natalenko wrote:
> > > > I'd like to nominate d583d360a6 ("psi: Fix psi state corruption when
> > > > schedule() races with cgroup move") for 5.12 stable tree.
> > > > 
> > > > Recently, I've hit this:
> > > > 
> > > > ```
> > > > kernel: psi: inconsistent task state! task=2667:clementine cpu=21
> > > > psi_flags=0 clear=1 set=0
> > > > ```
> > > > 
> > > > and after that PSI IO went crazy high. That seems to match the
> > > > symptoms
> > > > described in the commit message.
> > > 
> > > But this says it fixes 4117cebf1a9f ("psi: Optimize task switch inside
> > > shared cgroups") which did not show up until 5.13-rc1, so how are you
> > > hitting this issue?
> > 
> > I'm not positive 4117cebf1a9f was a root cause of the race. To me it looks
> > like 4117cebf1a9f just made an older issue more likely to be hit.
> > 
> > Peter, Johannes, am I correct saying that it is still possible to hit a
> > corruption described in d583d360a6 on 5.12?
> 
> I'm not aware of a previous issue, but it's possible you discovered
> one that was incidentally fixed by this change.
> 
> That said, there haven't been many changes in this area prior to 5.12,
> and I stared at the old code quite a bit to see if there are other
> possible scenarios, so this gives me pause.

Ack.

> > > Did you try this patch on 5.12.y and see that it solved your problem?
> > 
> > Yes, I've built the kernel with this patch, and so far it runs fine. It
> > can
> > take a while until the condition is hit though since it seems to be very
> > unlikely on 5.12.
> 
> Is your task moving / being moved between cgroups while it's doing
> work?

Likely, yes. IIUC, KDE spawns apps in separate cgroups so that in that very 
case Clementine should get its own one (?):

```
$ systemd-cgls
…
│   │ │ ├─app-clementine-df516e4181f446ab869e723ea2ed6094.scope 
│   │ │ │ ├─2926 /bin/clementine -session 
10de706f63000162437544200000015700012_1624379013_575845
│   │ │ │ ├─3059 /usr/bin/clementine-tagreader /tmp/clementine_735427711
│   │ │ │ ├─3060 /usr/bin/clementine-tagreader /tmp/clementine_557274898
│   │ │ │ ├─3062 /usr/bin/clementine-tagreader /tmp/clementine_1730944950
│   │ │ │ ├─3063 /usr/bin/clementine-tagreader /tmp/clementine_1509249421
│   │ │ │ ├─3065 /usr/bin/clementine-tagreader /tmp/clementine_1345386497
│   │ │ │ ├─3068 /usr/bin/clementine-tagreader /tmp/clementine_865255891
│   │ │ │ ├─3070 /usr/bin/clementine-tagreader /tmp/clementine_1782561441
│   │ │ │ ├─3072 /usr/bin/clementine-tagreader /tmp/clementine_421851305
│   │ │ │ ├─3073 /usr/bin/clementine-tagreader /tmp/clementine_175368243
│   │ │ │ ├─3075 /usr/bin/clementine-tagreader /tmp/clementine_1962830479
│   │ │ │ ├─3076 /usr/bin/clementine-tagreader /tmp/clementine_547573203
│   │ │ │ ├─3078 /usr/bin/clementine-tagreader /tmp/clementine_1819270047
│   │ │ │ ├─3079 /usr/bin/clementine-tagreader /tmp/clementine_1632862299
│   │ │ │ ├─3085 /usr/bin/clementine-tagreader /tmp/clementine_1279975869
│   │ │ │ ├─3095 /usr/bin/clementine-tagreader /tmp/clementine_1612119641
│   │ │ │ ├─3102 /usr/bin/clementine-tagreader /tmp/clementine_1789578483
│   │ │ │ ├─3103 /usr/bin/clementine-tagreader /tmp/clementine_1541442265
│   │ │ │ ├─3105 /usr/bin/clementine-tagreader /tmp/clementine_1418456770
│   │ │ │ ├─3106 /usr/bin/clementine-tagreader /tmp/clementine_1998684543
│   │ │ │ ├─3107 /usr/bin/clementine-tagreader /tmp/clementine_1349315391
│   │ │ │ ├─3108 /usr/bin/clementine-tagreader /tmp/clementine_231895572
│   │ │ │ ├─3110 /usr/bin/clementine-tagreader /tmp/clementine_492688785
│   │ │ │ ├─3111 /usr/bin/clementine-tagreader /tmp/clementine_1492630900
│   │ │ │ └─3112 /usr/bin/clementine-tagreader /tmp/clementine_2017490599
…
```

> How long does it usually take to trigger it?

I don't know :(. I don't usually peer into dmesg, and noticed this by a pure 
chance. Grepping the journal shows nothing else but only this occurrence, and 
also the journal is rotating, so some info might be already lost.

> Would it be possible to share a simpler reproducer, or is this part of
> a more complex application?

This was triggered bu KDE's autostart of Clementine player, and I don't have 
any specific reproducer. If I find one, I'll share it of course.

Thanks.

-- 
Oleksandr Natalenko (post-factum)






[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux