Re: [patch 15/15] mm: add strictlimit knob

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 7 Dec 2017 11:32:43 +0100 Miklos Szeredi <miklos@xxxxxxxxxx> wrote:

> On Thu, Dec 7, 2017 at 11:15 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
> > On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
> >>
> >> On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx>
> >> wrote:
> >>>
> >>> CC fuse maintainer, too.
> >>>
> >>> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
> >>>>
> >>>>
> >>>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@xxxxxxx> wrote:
> >>>>
> >>>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
> >>>>> > From: Maxim Patlasov <MPatlasov@xxxxxxxxxxxxx>
> >>>>> > Subject: mm: add strictlimit knob
> >>>>> >
> >>>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
> >>>>> > limits
> >>>>> > for FUSE which sets bdi max_ratio to 1% by default:
> >>>>> >
> >>>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
> >>>>> >
> >>>>> > However the feature can be useful for other relatively slow or
> >>>>> > untrusted
> >>>>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to
> >>>>> > enable
> >>>>> > the feature:
> >>>>> >
> >>>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
> >>>>> >
> >>>>> > Being enabled, the feature enforces bdi max_ratio limit even if
> >>>>> > global
> >>>>> > (10%) dirty limit is not reached.  Of course, the effect is not
> >>>>> > visible
> >>>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
> >>>>> > value.
> >>>>>
> >>>>> In principle I have nothing against this and the usecase sounds
> >>>>> reasonable
> >>>>> (in fact I believe the lack of a feature like this is one of reasons
> >>>>> why
> >>>>> desktop automounters usually mount USB devices with 'sync' mount
> >>>>> option).
> >>>>> So feel free to add:
> >>>>>
> >>>>> Reviewed-by: Jan Kara <jack@xxxxxxx>
> >>>>>
> >>>>
> >>>> Cc Jens, who may be vaguely interested in plans to finally merge this
> >>>> three-year-old patch?
> >>>>
> >>>>
> >>>>
> >>>> From: Maxim Patlasov <MPatlasov@xxxxxxxxxxxxx>
> >>>> Subject: mm: add strictlimit knob
> >>>>
> >>>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
> >>>> for FUSE which sets bdi max_ratio to 1% by default:
> >>>>
> >>>> http://article.gmane.org/gmane.linux.kernel.mm/105809
> >>>
> >>>
> >>>
> >>> That link is invalid for now, possibly due to the gmane site rebuild.
> >>> I find an email thread here which looks relevant:
> >>>
> >>> https://sourceforge.net/p/fuse/mailman/message/35254883/
> >>>
> >>> Where Maxim has an interesting point:
> >>>
> >>>        > Did any one try increasing the limit and did see any
> >>> better/worse
> >>>>
> >>>> performance ?
> >>>
> >>>
> >>>        We've used 20% as default value in OpenVZ kernel for a long while
> >>> (1%
> >>> was not enough to saturate our distributed parallel storage).
> >>>
> >>> So the knob will also enable people to _disable_ the 1% fuse limit to
> >>> increase performance.
> >>>
> >>> So people can use the exposed knob in 2 ways to fit their needs, which
> >>> is in general a good thing.
> >>>
> >>> However the comment in wb_position_ratio() says
> >>>
> >>>                        Without strictlimit feature, fuse writeback may
> >>>          * consume arbitrary amount of RAM because it is accounted in
> >>>          * NR_WRITEBACK_TEMP which is not involved in calculating
> >>> "nr_dirty".
> >>>
> >>> How dangerous would that be if some user disabled the 1% fuse limit
> >>> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
> >>> beyond the user's expectation (20% max dirty limit)?
> >>>
> >>> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
> >>> WB_WRITEBACK, which should be throttled when bdi_write_congested().
> >>> The congested flag will be set on
> >>>
> >>>        fuse_conn.num_background >= fuse_conn.congestion_threshold
> >>>        So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
> >>> it's not included in the 20% dirty limit.
> >>
> >>
> >> Only balance_dirty_pages_ratelimited() is going to limit the
> >> generation of dirty pages, I don't think congestion flags will do
> >> that.
> >
> >
> > Right. However my concern is something to limit the generation of
> > fuse's _writeback_ pages.
> >
> > The normal writeback pages are limited in 2 ways:
> >
> > - balance_dirty_pages_ratelimited()'s dirty throttling:
> >
> >  nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit
> >
> > - block layer's nr_requests queue limit
> >
> > However fuse's NR_WRITEBACK_TEMP looks special and has none of such
> > limits. The congested bit merely affect the vmscan pageout path.
> >
> >        pageout
> >          may_write_to_inode
> >            inode_write_congested
> >              wb_congested
> >
> > I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
> > Either explicitly or implicitly, there has to be some hard limit.
> >
> >> And (AFAICS) for fuse only  BDI_CAP_STRICTLIMIT will allow
> >> accounting temp writeback pages when throttling dirty page generation.
> >> So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
> >> So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
> >> permit disabling it for fuse).
> >
> >
> > So fuse relies on small nr_dirty. Does fuse impose any explicit or
> > implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)?
> > Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed.
> >
> > For example, is it possible for some process (eg. dd) to dirty pages
> > as fast as possible while some other kernel logic to convert PG_dirty
> > to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi
> > strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP)
> > cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP?
> 
> Hmm,  temp pages are still accounted as WB_WRITEBACK until writeback
> finishes.  Does that not count towards the dirty limit?
> 

This discussion died out and the patch is still "stuck" :(



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux