[Bug 202441] New: Possibly vfs cache related replicable xfs regression since 4.19.0 on sata hdd:s

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=202441

            Bug ID: 202441
           Summary: Possibly vfs cache related replicable xfs regression
                    since 4.19.0  on sata hdd:s
           Product: File System
           Version: 2.5
    Kernel Version: 4.19.0 - 5.0-rc3
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: XFS
          Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx
          Reporter: rogan6710@xxxxxxxxx
        Regression: Yes

I have a file system related problem where a compile job on a sata hdd almost
stops and ui becomes unresponsive when copying large files at the same time,
regardless of to what disk or from where they are copied.

All testing has been done on "bare metal" without even md, lvm or similar.
I have done a lot of testing of many different kernel versions on two different
systems (Slackware 14.2 and "current") and I feel confident that this is a
kernel regression.

The problem is _very_ pronounced when using xfs and it is only present from
kernel version 4.19.0 and all following versions NOT before (I have not tested
any 4.19 rc versions). I have tested many of them including the latest 4.19.18
and 5.0-rc3 with varying configurations and some very limited testing on
4.20.4.

It affects jfs, ext2, ext3, ext4 also but to a much lesser extent.
btrfs and reiserfs does not seem to be affected at all, at least not on the
4.19 series.

After adding another 16GB ram on one of my testing machines I noticed that it
took much more time before the compile job slowed down and ui became
unresponsive, so I suspected some cache related issue.
I made a few test runs and while watching "top" I observed that as soon as
buff/cache passed ~ 23G (total 24G) while copying, the compile job slowed down
to almost a halt, while the copying also slowed down significantly.

After echo 0 >/proc/sys/vm/vfs_cache_pressure the compilation runs without
slowdown all the way through, while copying retains its steady +100MB/sec.
This "solution" is tested on 4.19.17-18 with "generic" Slackware config
and 5.0-rc3 both on xfs.

Here's how I hit this issue every time on a pre-zen AMD:

1. A decent amount of data to copy, probably at least 5-10 times as much as ram
and reasonably fast media (~100Mb/sec) to copy from and to (Gbit nfs mount,
usb3 drive, regular hard drive...).

2. A dedicated xfs formatted regular rotating hard drive for the compile job (I
suppose any io-latency sensitive parallellizable job will do), This problem is
probably present for ssd's as well, but because they are so fast, cache becomes
less of an issue and you will maybe not notice much, at least I don't.

Compile job: defconfig linux kernel compile (parallellizable easy to redo).
Now open a few terminals with "top" in one of them, start copying in another
(use mc, easy to start and stop). Watch buff/cache grow in top, as is reaches
to within 70-80% of your ram, start compilation in another terminal, I use
"time make -j16" on my eight core 9590 AMD.

Under these circumstances a defconfig kernel compile (ver 4.19.17) takes about
3min 35s on 4.18.20 (xfs) and sometimes more than an hour using any version
after it. On Slackware "current" I use gcc 8.2.0 multilib, on 14.2 regular gcc
5.5.0 which seemed to produce slightly better results.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux