Re: xfsaild in D state seems to be blocking all other i/o sporadically

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2017年04月21日 15:43, Michael Weissenbacher wrote:
Hi Dave!
On 21.04.2017 01:16, Dave Chinner wrote:
On Thu, Apr 20, 2017 at 09:11:22AM +0200, Michael Weissenbacher wrote:
On 20.04.2017 01:48, Dave Chinner wrote:
The problem is that the backing buffers that are used for flushing
inodes have been reclaimed due to memory pressure, but the inodes in
cache are still dirty. Hence to write the dirty inodes, we first
have to read the inode buffer back into memory.

Interesting find. Is there a way to prevent those buffers from getting
reclaimed?
Not really. It's simply a side effect of memory reclaim not being
able to reclaim inodes or the page cache because they are dirty, and
so it goes and puts lots more pressure on clean caches. The working
set in those other caches gets trashed, and this it's a downward
spiral because it means dirty inodes and pages take longer are
require blocking IO to refill on demand...

Yesterday i found this patch for rsync:
http://insights.oetiker.ch/linux/fadvise/
It adds the option "--drop-cache" to rsync which sets
POSIX_FADV_DONTNEED to prevent caching. Been running with this option
since yesterday. Unfortunately, even this hasn't changed my problem.

In fact the best thing would be to disable file
content caching completely. Because of the use-case (backup server) it's
worthless to cache file content.
My primary objective is to avoid those stalls and reduce latency, at the
expense of throughput.
Set up dirty page cache writeback thresholds to be low (a couple of
hundred MB instead of 10/20% of memory) so that data writeback
starts early and throttles dirty pages to a small amount of memory.
This will help keep the page cache clean and immediately
reclaimable, hence it shouldn't put as much pressure on other caches
when memory reclaim is required.

In fact i already turned those down to 1024MiB/512MiB - not much change
here. I also set xfssyncd_centisecs to 100 like advised by you. Also not
much change after that.

I also noticed that unmounting the file system takes a really long time
after the problem occured (up to around 5 minutes!). Even when there was
nothing at all going on before the unmount. Would it help to capture the
unmount with trace-cmd?

A huge negative dcache entries would cause a slow umount,
please check it by 'cat /proc/sys/fs/dentry-state', the first column
is for used(active) entries while the second is for the unused (negative) entries.

It can be dropped by 'echo 2 > /proc/sys/vm/drop_caches'.

Thanks
Shan Hai

Here is another theory. Could it be that not the rsync's, but the rm's
issued by rsnapshot are causing the problem? Would it help to serialize
all "rm -Rf" calls? Those always delete the oldest backups, which can't
possibly be in the cache and because of that all those inodes need to be
read into memory during deletion. Maybe those rm's are filling up the
XFS log?

If that doesn't work out either i guess my only chance would be to
partition the device with LVM and create a separate XFS for every
rsnapshot instance. In that scenario every file system should get it's
own xfsaild, allowing them to run in parallel and not blocking each other?

cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux