Re: [patch 15/15] mm: add strictlimit knob

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 7, 2017 at 11:15 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
> On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
>>
>> On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx>
>> wrote:
>>>
>>> CC fuse maintainer, too.
>>>
>>> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
>>>>
>>>>
>>>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@xxxxxxx> wrote:
>>>>
>>>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
>>>>> > From: Maxim Patlasov <MPatlasov@xxxxxxxxxxxxx>
>>>>> > Subject: mm: add strictlimit knob
>>>>> >
>>>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
>>>>> > limits
>>>>> > for FUSE which sets bdi max_ratio to 1% by default:
>>>>> >
>>>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
>>>>> >
>>>>> > However the feature can be useful for other relatively slow or
>>>>> > untrusted
>>>>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to
>>>>> > enable
>>>>> > the feature:
>>>>> >
>>>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
>>>>> >
>>>>> > Being enabled, the feature enforces bdi max_ratio limit even if
>>>>> > global
>>>>> > (10%) dirty limit is not reached.  Of course, the effect is not
>>>>> > visible
>>>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
>>>>> > value.
>>>>>
>>>>> In principle I have nothing against this and the usecase sounds
>>>>> reasonable
>>>>> (in fact I believe the lack of a feature like this is one of reasons
>>>>> why
>>>>> desktop automounters usually mount USB devices with 'sync' mount
>>>>> option).
>>>>> So feel free to add:
>>>>>
>>>>> Reviewed-by: Jan Kara <jack@xxxxxxx>
>>>>>
>>>>
>>>> Cc Jens, who may be vaguely interested in plans to finally merge this
>>>> three-year-old patch?
>>>>
>>>>
>>>>
>>>> From: Maxim Patlasov <MPatlasov@xxxxxxxxxxxxx>
>>>> Subject: mm: add strictlimit knob
>>>>
>>>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
>>>> for FUSE which sets bdi max_ratio to 1% by default:
>>>>
>>>> http://article.gmane.org/gmane.linux.kernel.mm/105809
>>>
>>>
>>>
>>> That link is invalid for now, possibly due to the gmane site rebuild.
>>> I find an email thread here which looks relevant:
>>>
>>> https://sourceforge.net/p/fuse/mailman/message/35254883/
>>>
>>> Where Maxim has an interesting point:
>>>
>>>        > Did any one try increasing the limit and did see any
>>> better/worse
>>>>
>>>> performance ?
>>>
>>>
>>>        We've used 20% as default value in OpenVZ kernel for a long while
>>> (1%
>>> was not enough to saturate our distributed parallel storage).
>>>
>>> So the knob will also enable people to _disable_ the 1% fuse limit to
>>> increase performance.
>>>
>>> So people can use the exposed knob in 2 ways to fit their needs, which
>>> is in general a good thing.
>>>
>>> However the comment in wb_position_ratio() says
>>>
>>>                        Without strictlimit feature, fuse writeback may
>>>          * consume arbitrary amount of RAM because it is accounted in
>>>          * NR_WRITEBACK_TEMP which is not involved in calculating
>>> "nr_dirty".
>>>
>>> How dangerous would that be if some user disabled the 1% fuse limit
>>> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
>>> beyond the user's expectation (20% max dirty limit)?
>>>
>>> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
>>> WB_WRITEBACK, which should be throttled when bdi_write_congested().
>>> The congested flag will be set on
>>>
>>>        fuse_conn.num_background >= fuse_conn.congestion_threshold
>>>        So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
>>> it's not included in the 20% dirty limit.
>>
>>
>> Only balance_dirty_pages_ratelimited() is going to limit the
>> generation of dirty pages, I don't think congestion flags will do
>> that.
>
>
> Right. However my concern is something to limit the generation of
> fuse's _writeback_ pages.
>
> The normal writeback pages are limited in 2 ways:
>
> - balance_dirty_pages_ratelimited()'s dirty throttling:
>
>  nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit
>
> - block layer's nr_requests queue limit
>
> However fuse's NR_WRITEBACK_TEMP looks special and has none of such
> limits. The congested bit merely affect the vmscan pageout path.
>
>        pageout
>          may_write_to_inode
>            inode_write_congested
>              wb_congested
>
> I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
> Either explicitly or implicitly, there has to be some hard limit.
>
>> And (AFAICS) for fuse only  BDI_CAP_STRICTLIMIT will allow
>> accounting temp writeback pages when throttling dirty page generation.
>> So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
>> So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
>> permit disabling it for fuse).
>
>
> So fuse relies on small nr_dirty. Does fuse impose any explicit or
> implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)?
> Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed.
>
> For example, is it possible for some process (eg. dd) to dirty pages
> as fast as possible while some other kernel logic to convert PG_dirty
> to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi
> strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP)
> cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP?

Hmm,  temp pages are still accounted as WB_WRITEBACK until writeback
finishes.  Does that not count towards the dirty limit?

Thanks,
Miklos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux