Re: [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/6/22 10:46 AM, Johannes Weiner wrote:
Hello,

On Mon, Jun 06, 2022 at 09:32:13AM +1000, Dave Chinner wrote:

Sure, but you've brought a problem we don't understand the root
cause of to my attention. I want to know what the root cause is so
that I can determine that there are no other unknown underlying
issues that are contributing to this issue.

It seems to me we're just not on the same page on what the reported
bug is. From my POV, there currently isn't a missing piece in this
puzzle. But Chris worked closer with the prod folks on this, so I'll
leave it to him :)

The basic description of the investigation:

* Multiple hits per hour on per 100K machines, but almost impossible to catch across a single box. * The debugging information from the long tail detector showed high IO and high CPU time. (high CPU is relative here, these machines tend to be IO bound).
* Kernel stack analysis showed IO completion threads waiting for CPU.
* CPU profiling showed redirty_page_for_writepage() dominating.

From here we made a relatively simple reproduction of the redirty_page_for_writepage() part of the problem. It's a good fix in isolation, but we'll have to circle back to see how much of the long tail latency issue it solves.

We can livepatch it quickly, but filtering out the long tail latency hits for just this one bug is labor intensive, so it'll take a little bit of time to get good data.

I've got a v2 of the patch that drops the invalidate, doing a load test with fsx this morning and then getting a second xfstests baseline run to see if I've added new failures.

-chris



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux