On Tue, Aug 21, 2012 at 03:00:13PM +0900, Namjae Jeon wrote: > 2012/8/20, Fengguang Wu <fengguang.wu@xxxxxxxxx>: > > On Mon, Aug 20, 2012 at 09:48:42AM +0900, Namjae Jeon wrote: > >> 2012/8/19, Fengguang Wu <fengguang.wu@xxxxxxxxx>: > >> > On Sat, Aug 18, 2012 at 05:50:02AM -0400, Namjae Jeon wrote: > >> >> From: Namjae Jeon <namjae.jeon@xxxxxxxxxxx> > >> >> > >> >> This patch is based on suggestion by Wu Fengguang: > >> >> https://lkml.org/lkml/2011/8/19/19 > >> >> > >> >> kernel has mechanism to do writeback as per dirty_ratio and > >> >> dirty_background > >> >> ratio. It also maintains per task dirty rate limit to keep balance of > >> >> dirty pages at any given instance by doing bdi bandwidth estimation. > >> >> > >> >> Kernel also has max_ratio/min_ratio tunables to specify percentage of > >> >> writecache > >> >> to control per bdi dirty limits and task throtelling. > >> >> > >> >> However, there might be a usecase where user wants a writeback tuning > >> >> parameter to flush dirty data at desired/tuned time interval. > >> >> > >> >> dirty_background_time provides an interface where user can tune > >> >> background > >> >> writeback start time using /sys/block/sda/bdi/dirty_background_time > >> >> > >> >> dirty_background_time is used alongwith average bdi write bandwidth > >> >> estimation > >> >> to start background writeback. > >> > > >> > Here lies my major concern about dirty_background_time: the write > >> > bandwidth estimation is an _estimation_ and will sure become wildly > >> > wrong in some cases. So the dirty_background_time implementation based > >> > on it will not always work to the user expectations. > >> > > >> > One important case is, some users (eg. Dave Chinner) explicitly take > >> > advantage of the existing behavior to quickly create & delete a big > >> > 1GB temp file without worrying about triggering unnecessary IOs. > >> > > >> Hi. Wu. > >> Okay, I have a question. > >> > >> If making dirty_writeback_interval per bdi to tune short interval > >> instead of background_time, We can get similar performance > >> improvement. > >> /sys/block/<device>/bdi/dirty_writeback_interval > >> /sys/block/<device>/bdi/dirty_expire_interval > >> > >> NFS write performance improvement is just one usecase. > >> > >> If we can set interval/time per bdi, other usecases will be created > >> by applying. > > > > Per-bdi interval/time tunables, if there comes such a need, will in > > essential be for data caching and safety. If turning them into some > > requirement for better performance, the users will potential be > > stretched on choosing the "right" value for balanced data cache, > > safety and performance. Hmm, not a comfortable prospection. > Hi Wu. > First, Thanks for shared information. > > I change writeback interval on NFS server only. OK..sorry for missing that part! > I think that this does not affect data cache/page behaviour(caching) > change on NFS client. NFS client will start sending write requests as > per default NFS/writeback logic. So, no change in NFS client data > caching behaviour. > > Also, on NFS server it does not make change in system-wide caching > behaviour. It only modifies caching/writeback behaviour of a > particular “bdi” on NFS server so that NFS client could see better > WRITE speed. But would you default to dirty_background_time=0, where the special value 0 means no change of the original behavior? That will address David's very reasonable concern. Otherwise quite a few users are going to be surprised by the new behavior after upgrading kernel. > I will share several performancetest results as Dave's opinion. > > > > >> >The numbers are impressive! FYI, I tried another NFS specific approach > >> >to avoid big NFS COMMITs, which achieved similar performance gains: > >> > >> >nfs: writeback pages wait queue > >> >https://lkml.org/lkml/2011/10/20/235 > This patch looks client side optimization to me.(need to check more) Yes. > Do we need the optimization of server side as Bruce's opinion ? Sure. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html