Re: Regression in NFS probably due to very large amounts of readahead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024-11-26 02:48, Philippe Troin wrote:
On Sat, 2024-11-23 at 23:32 +0100, Anders Blomdell wrote:
When we (re)started one of our servers with 6.11.3-200.fc40.x86_64,
we got terrible performance (lots of nfs: server x.x.x.x not
responding).
What triggered this problem was virtual machines with NFS-mounted
qcow2 disks
that often triggered large readaheads that generates long streaks of
disk I/O
of 150-600 MB/s (4 ordinary HDD's) that filled up the buffer/cache
area of the
machine.

A git bisect gave the following suspect:

git bisect start

8< snip >8

# first bad commit: [7c877586da3178974a8a94577b6045a48377ff25]
readahead: properly shorten readahead when falling back to
do_page_cache_ra()

Thank you for taking the time to bisect, this issue has been bugging
me, but it's been non-deterministic, and hence hard to bisect.

I'm seeing the same problem on 6.11.10 (and earlier 6.11.x kernels) in
slightly different setups:

(1) On machines mounting NFSv3 shared drives. The symptom here is a
"nfs server XXX not responding, still trying" that never recovers
(while the server remains pingable and other NFSv3 volumes from the
hanging server can be mounted).

(2) On VMs running over qemu-kvm, I see very long stalls (can be up to
several minutes) on random I/O. These stalls eventually recover.

I've built a 6.11.10 kernel with
7c877586da3178974a8a94577b6045a48377ff25 reverted and I'm back to
normal (no more NFS hangs, no more VM stalls).

Phil.
Some printk debugging, seems to indicate that the problem
is that the entity 'ra->size - (index - start)' goes
negative, which then gets cast to a very large unsigned
'nr_to_read' when calling 'do_page_cache_ra'. Where the true
bug is still eludes me, though.

/Anders




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux