On 05/10/2011 09:22 AM, Jan Kara wrote:
On Wed 11-05-11 01:12:13, OGAWA Hirofumi wrote:
Jan Kara<jack@xxxxxxx> writes:
Did you already consider, to copy only if page was writeback (like
copy-on-write)? I.e. if page is on I/O, copy, then switch the page for
writing new data.
Yes, that was considered as well. We'd have to essentially migrate the
page that is under writeback and should be written to. You are going to pay
the cost of page allocation, copy, increased memory& cache pressure.
Depending on your backing storage and workload this may or may not be better
than waiting for IO...
Maybe possible, but you really think on usual case just blocking is
better?
Define usual case... As Christoph noted, we don't currently have a real
practical case where blocking would matter (since frequent rewrites are
rather rare). So defining what is usual when we don't have a single real
case is kind of tough ;)
I'm a bit late to the party, but I have such a use case. I have a
real-time program that generates logs. There's a thread that makes sure
that there are always mlocked, MAP_SHARED, writable pages for the logs,
and under normal (or even very heavy) load, the mlocked pages always
stay far ahead of the logs. On 2.6.39, it works great [1]. On 3.0,
it's unusable -- latencies of 30-100 ms are very common.
In this case, neither throughput nor available memory matter at all --
I'm not stressing either. So copying the pages (especially if they're
mlocked) would be more than a small percentage win -- it would be the
difference between great performance and unusability.
I wonder if we want a stronger version of mlock that says "this page
must not be swapped out and, in addition, ptes must always be mapped
with all appropriate permission bits set". (This is only possible with
hardware dirty and accessed bits, but we could come close even without
them.)
[1] file_update_time is a problem. patches coming.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html