Re: [PATCH 10/10] mm: Account for WRITEBACK_TEMP in balance_dirty_pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/17/2012 11:11 PM, Miklos Szeredi wrote:
> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes:

Miklos, sorry for the late response. Please, find the answers inline.

>> On 07/13/2012 08:57 PM, Miklos Szeredi wrote:
>>> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes:
>>>
>>>> Make balance_dirty_pages start the throttling when the WRITEBACK_TEMP
>>>> counter is hight ehough. This prevents us from having too many dirty
>>>> pages on fuse, thus giving the userspace part of it a chance to write
>>>> stuff properly.
>>>>
>>>> Note, that the existing balance logic is per-bdi, i.e. if the fuse
>>>> user task gets stuck in the function this means, that it either
>>>> writes to the mountpoint it serves (but it can deadlock even without
>>>> the writeback) or it is wrting to some _other_ dirty bdi and in the
>>>> latter case someone else will free the memory for it.
>>>
>>> This is not just about deadlocking.  Unprivileged fuse filesystems
>>> should not impact the operation of other filesystems.  I.e. a fuse
>>> filesystem which is not making progress writing out pages shouln't cause
>>> a write on an unrelated filesystem to block.
>>>
>>> I believe this patch breaks that promise.
>>
>> Hm... I believe it does not, and that's why.
>>
>> When a task writes to some bdi the balance_dirty_pages will evaluate the
>> amount of time to block this task on based on this bdi dirty set counters.
>> The global stats are only used to a) check whether this decision should be
>> made at all
> 
> Okay, maybe I'm blind but if this is true, then how is
> balance_dirty_pages() supposed to ensure that the per-bdi limit is not
> exceeded?

The balance_dirty_pages logic is _very_ roughly the the following:

Let this_bdi be a bdi the current task is writing to
Let D be the total amount of dirty and writeback memory (and writeback_tmp after this patch)
Let L be the limit of dirty memory (L = ram_size * ratio)
Let d be the amount of dirty and writeback on this_bdi
And let l be the limit of dirty memory on this_bdi

With that the balancer logic look like

while (1) {
	if (D < L)
		return;

	start_background_writeback(this_bdi);

	if (d < l)
		return;

	timeout = get_sleep_timeout(d, l, D, L);
	shcedule_timeout(timeout);
}

The d and l are calculated out of the D and L using this_bdi and global IO completions
proportions (with more complexity, but still).

Thus, since we throttle tasks looking ad d and l only we cannot affect all the bdis in
the system by live-locking a single one of them.

Accounting for writeback_tmp is required since the D should become high when there are
lots of pages in-flight in FUSE. Otherwise, the balance_dirty_pages will not limit the
task writing on a fuse mount.

>> and b) evaluate the dirty "fraction" of a bdi. That said, even
>> if we stop the fuse daemon (I actually did this) other filesystems won't
>> lock. The global counter would be high, yes, but the dirty set fraction of 
>> non-fuse bdi would be low thus allowing others to progress.
> 
> That makes some sense, but it looks to me that FUSE, NFS and friends
> want a stricter dirty balancing logic that looks at the bdi thresholds
> even if the global limits are not exceeded.

Probably, but I did a very straighforward test -- I just stopped the fuse daemon and started
writing to a fuse file. After some time the writing task was locked in balance_dirty_pages,
since fuse daemon didn't ack-ed writeback. At the same time I tried to write to other bdis
(disks and nfs) and none of them was locked, all the writes succeeded. After I let the fuse
daemon run again the fuse-writer unlocked and went on writing.

Do you have some trickier scenario in mind?

> Thanks,
> Miklos
> .
> 

Thanks,
Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux