Hi,
A recent patch from Linus limiting global_dirtyable_memory to 1GB (see
"Disabling in-memory write cache for x86-64 in Linux" thread) drew
attention to a long-standing problem: on a node with a huge amount of
RAM installed, the global dirty threshold is high, and existing
behaviour of balance_dirty_pages() skips throttling until the global
limit is reached. So, by the time balance_dirty_pages() starts
throttling, you can easily end up in a huge amount of dirty pages backed
up by some (e.g. slow USB) device.
A lot of ideas were proposed, but no conclusion was made. In particular,
one of suggested approaches is to develop per-BDI time-based limits and
to enable them for all: don't allow dirty cache of BDI to grow over 5s
of measured writeback speed. The approach looks pretty straightforward,
but in practice it may be tricky to implement: you cannot discover how
fast a device is until you load it heavily enough, and conversely, you
must go far beyond current per-BDI limit to load the device heavily. And
other approaches have other caveats as usual.
I'm interested in attending upcoming LSF/MM to discuss the topic above
as well as two other unrelated ones:
* future improvements of FUSE. Having "write-back cache policy"
patch-set almost adopted and patches for synchronous close(2) and
umount(2) in queue, I'd like to keep my efforts in sync with other FUSE
developers.
* reboot-less kernel updates. Since memory reset can be avoided by
booting the new kernel using Kexec, and almost any application can be
checkpointed and then restored by CRIU, the downtime can be diminished
significantly by keeping userspace processes' working set in memory
while the system gets updated. Questions to discuss are how to prevent
the kernel from using some memory regions on boot, what interface can be
reused/introduced for managing the regions and how they can be
re-installed back into processes' address space on restore.
Thanks,
Maxim
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html