Re: [PATCH 3/3] writeback: add dirty_ratio_time per bdi variable (NFS write performance)

Namjae Jeon <linkinjeon@xxxxxxxxx> · Wed, 22 Aug 2012 10:10:11 +0900



2012/8/21, Fengguang Wu <fengguang.wu@xxxxxxxxx>:
> On Tue, Aug 21, 2012 at 03:00:13PM +0900, Namjae Jeon wrote:
>> 2012/8/20, Fengguang Wu <fengguang.wu@xxxxxxxxx>:
>> > On Mon, Aug 20, 2012 at 09:48:42AM +0900, Namjae Jeon wrote:
>> >> 2012/8/19, Fengguang Wu <fengguang.wu@xxxxxxxxx>:
>> >> > On Sat, Aug 18, 2012 at 05:50:02AM -0400, Namjae Jeon wrote:
>> >> >> From: Namjae Jeon <namjae.jeon@xxxxxxxxxxx>
>> >> >>
>> >> >> This patch is based on suggestion by Wu Fengguang:
>> >> >> https://lkml.org/lkml/2011/8/19/19
>> >> >>
>> >> >> kernel has mechanism to do writeback as per dirty_ratio and
>> >> >> dirty_background
>> >> >> ratio. It also maintains per task dirty rate limit to keep balance
>> >> >> of
>> >> >> dirty pages at any given instance by doing bdi bandwidth
>> >> >> estimation.
>> >> >>
>> >> >> Kernel also has max_ratio/min_ratio tunables to specify percentage
>> >> >> of
>> >> >> writecache
>> >> >> to control per bdi dirty limits and task throtelling.
>> >> >>
>> >> >> However, there might be a usecase where user wants a writeback
>> >> >> tuning
>> >> >> parameter to flush dirty data at desired/tuned time interval.
>> >> >>
>> >> >> dirty_background_time provides an interface where user can tune
>> >> >> background
>> >> >> writeback start time using /sys/block/sda/bdi/dirty_background_time
>> >> >>
>> >> >> dirty_background_time is used alongwith average bdi write bandwidth
>> >> >> estimation
>> >> >> to start background writeback.
>> >> >
>> >> > Here lies my major concern about dirty_background_time: the write
>> >> > bandwidth estimation is an _estimation_ and will sure become wildly
>> >> > wrong in some cases. So the dirty_background_time implementation
>> >> > based
>> >> > on it will not always work to the user expectations.
>> >> >
>> >> > One important case is, some users (eg. Dave Chinner) explicitly take
>> >> > advantage of the existing behavior to quickly create & delete a big
>> >> > 1GB temp file without worrying about triggering unnecessary IOs.
>> >> >
>> >> Hi. Wu.
>> >> Okay, I have a question.
>> >>
>> >> If making dirty_writeback_interval per bdi to tune short interval
>> >> instead of background_time, We can get similar performance
>> >> improvement.
>> >> /sys/block/<device>/bdi/dirty_writeback_interval
>> >> /sys/block/<device>/bdi/dirty_expire_interval
>> >>
>> >> NFS write performance improvement is just one usecase.
>> >>
>> >> If we can set interval/time per bdi,  other usecases will be created
>> >> by applying.
>> >
>> > Per-bdi interval/time tunables, if there comes such a need, will in
>> > essential be for data caching and safety. If turning them into some
>> > requirement for better performance, the users will potential be
>> > stretched on choosing the "right" value for balanced data cache,
>> > safety and performance.  Hmm, not a comfortable prospection.
>> Hi Wu.
>> First, Thanks for shared information.
>>
>> I change writeback interval on NFS server only.
>
> OK..sorry for missing that part!
>
>> I think that this does not affect data cache/page behaviour(caching)
>> change on NFS client. NFS client will start sending write requests as
>> per default NFS/writeback logic. So, no change in NFS client data
>> caching behaviour.
>>
>> Also, on NFS server it does not make change in system-wide caching
>> behaviour. It only modifies caching/writeback behaviour of a
>> particular “bdi” on NFS server so that NFS client could see better
>> WRITE speed.
>
> But would you default to dirty_background_time=0, where the special
> value 0 means no change of the original behavior? That will address
> David's very reasonable concern. Otherwise quite a few users are going
> to be surprised by the new behavior after upgrading kernel.
Hi. Wu.
Okay, I will resend v2 patch included your
comment(dirty_background_time=0 at default)
Thanks a lot.

>
>> I will share several performancetest results as Dave's opinion.
>>
>> >
>> >> >The numbers are impressive! FYI, I tried another NFS specific
>> >> > approach
>> >> >to avoid big NFS COMMITs, which achieved similar performance gains:
>> >>
>> >> >nfs: writeback pages wait queue
>> >> >https://lkml.org/lkml/2011/10/20/235
>> This patch looks client side optimization to me.(need to check more)
>
> Yes.
>
>> Do we need the optimization of server side as Bruce's opinion ?
>
> Sure.
>
> Thanks,
> Fengguang
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html