Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdi variable

Namjae Jeon <linkinjeon@xxxxxxxxx> · Tue, 25 Sep 2012 15:46:25 +0900

2012/9/25, Dave Chinner <david@xxxxxxxxxxxxx>:
> On Thu, Sep 20, 2012 at 04:44:22PM +0800, Fengguang Wu wrote:
>> [ CC FS and MM lists ]
>>
>> Patch looks good to me, however we need to be careful because it's
>> introducing a new interface. So it's desirable to get some acks from
>> the FS/MM developers.
>>
>> Thanks,
>> Fengguang
>>
>> On Sun, Sep 16, 2012 at 08:25:42AM -0400, Namjae Jeon wrote:
>> > From: Namjae Jeon <namjae.jeon@xxxxxxxxxxx>
>> >
>> > This patch is based on suggestion by Wu Fengguang:
>> > https://lkml.org/lkml/2011/8/19/19
>> >
>> > kernel has mechanism to do writeback as per dirty_ratio and
>> > dirty_background
>> > ratio. It also maintains per task dirty rate limit to keep balance of
>> > dirty pages at any given instance by doing bdi bandwidth estimation.
>> >
>> > Kernel also has max_ratio/min_ratio tunables to specify percentage of
>> > writecache to control per bdi dirty limits and task throttling.
>> >
>> > However, there might be a usecase where user wants a per bdi writeback
>> > tuning
>> > parameter to flush dirty data once per bdi dirty data reach a threshold
>> > especially at NFS server.
>> >
>> > dirty_background_centisecs provides an interface where user can tune
>> > background writeback start threshold using
>> > /sys/block/sda/bdi/dirty_background_centisecs
>> >
>> > dirty_background_centisecs is used alongwith average bdi write
>> > bandwidth
>> > estimation to start background writeback.
>> >
>> > One of the use case to demonstrate the patch functionality can be
>> > on NFS setup:-
>> > We have a NFS setup with ethernet line of 100Mbps, while the USB
>> > disk is attached to server, which has a local speed of 25MBps. Server
>> > and client both are arm target boards.
>> >
>> > Now if we perform a write operation over NFS (client to server), as
>> > per the network speed, data can travel at max speed of 100Mbps. But
>> > if we check the default write speed of USB hdd over NFS it comes
>> > around to 8MB/sec, far below the speed of network.
>> >
>> > Reason being is as per the NFS logic, during write operation, initially
>> > pages are dirtied on NFS client side, then after reaching the dirty
>> > threshold/writeback limit (or in case of sync) data is actually sent
>> > to NFS server (so now again pages are dirtied on server side). This
>> > will be done in COMMIT call from client to server i.e if 100MB of data
>> > is dirtied and sent then it will take minimum 100MB/10Mbps ~ 8-9
>> > seconds.
>> >
>> > After the data is received, now it will take approx 100/25 ~4 Seconds
>> > to
>> > write the data to USB Hdd on server side. Hence making the overall time
>> > to write this much of data ~12 seconds, which in practically comes out
>> > to
>> > be near 7 to 8MB/second. After this a COMMIT response will be sent to
>> > NFS
>> > client.
>> >
>> > However we may improve this write performace by making the use of NFS
>> > server idle time i.e while data is being received from the client,
>> > simultaneously initiate the writeback thread on server side. So instead
>> > of waiting for the complete data to come and then start the writeback,
>> > we can work in parallel while the network is still busy in receiving
>> > the
>> > data. Hence in this way overall performace will be improved.
>> >
>> > If we tune dirty_background_centisecs, we can see there
>> > is increase in the performace and it comes out to be ~ 11MB/seconds.
>> > Results are:-
>> >
>> > Write test(create a 1 GB file) result at 'NFS client' after changing
>> > /sys/block/sda/bdi/dirty_background_centisecs
>> > on  *** NFS Server only - not on NFS Client ****
>

Hi. Dave.

> What is the configuration of the client and server? How much RAM,
> what their dirty_* parameters are set to, network speed, server disk
> speed for local sequential IO, etc?
these results are on ARM, 512MB RAM and XFS over NFS with default
writeback settings(only our writeback setting - dirty_background_cen
tisecs changed at nfs server only). Network speed is ~100MB/sec and
local disk speed is ~25MB/sec.

>
>> > ---------------------------------------------------------------------
>> > |WRITE Test with various 'dirty_background_centisecs' at NFS Server |
>> > ---------------------------------------------------------------------
>> > |          | default = 0 | 300 centisec| 200 centisec| 100 centisec |
>> > ---------------------------------------------------------------------
>> > |RecSize   |  WriteSpeed |  WriteSpeed |  WriteSpeed |  WriteSpeed  |
>> > ---------------------------------------------------------------------
>> > |10485760  |  8.44MB/sec |  8.60MB/sec |  9.30MB/sec |  10.27MB/sec |
>> > | 1048576  |  8.48MB/sec |  8.87MB/sec |  9.31MB/sec |  10.34MB/sec |
>> > |  524288  |  8.37MB/sec |  8.42MB/sec |  9.84MB/sec |  10.47MB/sec |
>> > |  262144  |  8.16MB/sec |  8.51MB/sec |  9.52MB/sec |  10.62MB/sec |
>> > |  131072  |  8.48MB/sec |  8.81MB/sec |  9.42MB/sec |  10.55MB/sec |
>> > |   65536  |  8.38MB/sec |  9.09MB/sec |  9.76MB/sec |  10.53MB/sec |
>> > |   32768  |  8.65MB/sec |  9.00MB/sec |  9.57MB/sec |  10.54MB/sec |
>> > |   16384  |  8.27MB/sec |  8.80MB/sec |  9.39MB/sec |  10.43MB/sec |
>> > |    8192  |  8.52MB/sec |  8.70MB/sec |  9.40MB/sec |  10.50MB/sec |
>> > |    4096  |  8.20MB/sec |  8.63MB/sec |  9.80MB/sec |  10.35MB/sec |
>> > ---------------------------------------------------------------------
>
> While this set of numbers looks good, it's a very limited in scope.
> I can't evaluate whether the change is worthwhile or not from this
> test. If I was writing this patch, the questions I'd be seeking to
> answer before proposing it for inclusion are as follows....
>
> 1. what's the comparison in performance to typical NFS
> server writeback parameter tuning? i.e. dirty_background_ratio=5,
> dirty_ratio=10, dirty_expire_centiseconds=1000,
> dirty_writeback_centisecs=1? i.e. does this give change give any
> benefit over the current common practice for configuring NFS
> servers?
>
> 2. what happens when you have 10 clients all writing to the server
> at once? Or a 100? NFS servers rarely have a single writer to a
> single file at a time, so what impact does this change have on
> multiple concurrent file write performance from multiple clients?
>
> 3. Following on from the multiple client test, what difference does it
> make to file fragmentation rates? Writing more frequently means
> smaller allocations and writes, and that tends to lead to higher
> fragmentation rates, especially when multiple files are being
> written concurrently. Higher fragmentation also means lower
> performance over time as fragmentation accelerates filesystem aging
> effects on performance.  IOWs, it may be faster when new, but it
> will be slower 3 months down the track and that's a bad tradeoff to
> make.
>
> 4. What happens for higher bandwidth network links? e.g. gigE or
> 10gigE? Are the improvements still there? Or does it cause
> regressions at higher speeds? I'm especially interested in what
> happens to multiple writers at higher network speeds, because that's
> a key performance metric used to measure enterprise level NFS
> servers.
>
> 5. Are the improvements consistent across different filesystem
> types?  We've had writeback changes in the past cause improvements
> on one filesystem but significant regressions on others.  I'd
> suggest that you need to present results for ext4, XFS and btrfs so
> that we have a decent idea of what we can expect from the change to
> the generic code.
>
> Yeah, I'm asking a lot of questions. That's because the generic
> writeback code is extremely important to performance and the impact
> of a change cannot be evaluated from a single test.
Yes, I agree.
I will share patch behavior in gigabit Ethernet, different
filesystems(e.g. ext4, xfs and btrfs) and multiple NFS clients setup.

Thanks.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html