Help deciding about backported patch (kernel bug 214767, 19f4e7cc8197 xfs: Fix CIL throttle hang when CIL space used going backwards)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I’ve been debugging an elusive XFS issue that I could not track down to any other parameters than it being an xfs internal bug. I’ve recorded what I’ve seen so far in https://bugzilla.kernel.org/show_bug.cgi?id=214767 and Dave recommended that "19f4e7cc8197 xfs: Fix CIL throttle hang when CIL space used going backwards” is likely the issue. AFAICT this was not backported to the 5.10 branch and we’ve been updating to vanilla kernels diligently and still keep seeing this issue. Unfortunately within a fleet of around 1k VMs it strikes about once every week or so and there’s no way to predict when and where.

So, I took Dave’s pointer and applied the patch to our 5.10 series (basd on 5.10.76 at that point) and it applied cleanly. The machine boots fine and I ran the XFS test suite. However, I haven’t done any tests using the test suite before and I’m getting a number of errors where I don’t know how to interpret the results. Some of those seem to be due to not having the DEBUG flag set in the kernel, others … I’m not sure.

So, before rolling out this change into our fleet I’d like to double check the results and also whether you think applying that patch to the 5.10 series sounds reasonable. I’m also wondering whether there was a specific reason this wasn’t backported in the first place and whether others think it should be. Other input about the applicability of this patch to the issue I’m seeing are appreciated as well, of course.

I’m attaching the test runner output, unfortunately I lost the actual outputs as the test ran quite long and the outputs where cleaned up by the tempfile watcher faster than I could retrieve them. I can run them again, my estimation currently is it takes around 3-4 days to complete them, though.

Attachment: xfstests.out
Description: Binary data


Kind regards and stay safe,
Christian

--
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux