2012/9/25, Namjae Jeon <linkinjeon@xxxxxxxxx>: > 2012/9/25, Dave Chinner <david@xxxxxxxxxxxxx>: >> On Thu, Sep 20, 2012 at 04:44:22PM +0800, Fengguang Wu wrote: >>> [ CC FS and MM lists ] >>> >>> Patch looks good to me, however we need to be careful because it's >>> introducing a new interface. So it's desirable to get some acks from >>> the FS/MM developers. >>> >>> Thanks, >>> Fengguang >>> >>> On Sun, Sep 16, 2012 at 08:25:42AM -0400, Namjae Jeon wrote: >>> > From: Namjae Jeon <namjae.jeon@xxxxxxxxxxx> >>> > >>> > This patch is based on suggestion by Wu Fengguang: >>> > https://lkml.org/lkml/2011/8/19/19 >>> > >>> > kernel has mechanism to do writeback as per dirty_ratio and >>> > dirty_background >>> > ratio. It also maintains per task dirty rate limit to keep balance of >>> > dirty pages at any given instance by doing bdi bandwidth estimation. >>> > >>> > Kernel also has max_ratio/min_ratio tunables to specify percentage of >>> > writecache to control per bdi dirty limits and task throttling. >>> > >>> > However, there might be a usecase where user wants a per bdi writeback >>> > tuning >>> > parameter to flush dirty data once per bdi dirty data reach a >>> > threshold >>> > especially at NFS server. >>> > >>> > dirty_background_centisecs provides an interface where user can tune >>> > background writeback start threshold using >>> > /sys/block/sda/bdi/dirty_background_centisecs >>> > >>> > dirty_background_centisecs is used alongwith average bdi write >>> > bandwidth >>> > estimation to start background writeback. >>> > >>> > One of the use case to demonstrate the patch functionality can be >>> > on NFS setup:- >>> > We have a NFS setup with ethernet line of 100Mbps, while the USB >>> > disk is attached to server, which has a local speed of 25MBps. Server >>> > and client both are arm target boards. >>> > >>> > Now if we perform a write operation over NFS (client to server), as >>> > per the network speed, data can travel at max speed of 100Mbps. But >>> > if we check the default write speed of USB hdd over NFS it comes >>> > around to 8MB/sec, far below the speed of network. >>> > >>> > Reason being is as per the NFS logic, during write operation, >>> > initially >>> > pages are dirtied on NFS client side, then after reaching the dirty >>> > threshold/writeback limit (or in case of sync) data is actually sent >>> > to NFS server (so now again pages are dirtied on server side). This >>> > will be done in COMMIT call from client to server i.e if 100MB of data >>> > is dirtied and sent then it will take minimum 100MB/10Mbps ~ 8-9 >>> > seconds. >>> > >>> > After the data is received, now it will take approx 100/25 ~4 Seconds >>> > to >>> > write the data to USB Hdd on server side. Hence making the overall >>> > time >>> > to write this much of data ~12 seconds, which in practically comes out >>> > to >>> > be near 7 to 8MB/second. After this a COMMIT response will be sent to >>> > NFS >>> > client. >>> > >>> > However we may improve this write performace by making the use of NFS >>> > server idle time i.e while data is being received from the client, >>> > simultaneously initiate the writeback thread on server side. So >>> > instead >>> > of waiting for the complete data to come and then start the writeback, >>> > we can work in parallel while the network is still busy in receiving >>> > the >>> > data. Hence in this way overall performace will be improved. >>> > >>> > If we tune dirty_background_centisecs, we can see there >>> > is increase in the performace and it comes out to be ~ 11MB/seconds. >>> > Results are:- >>> > >>> > Write test(create a 1 GB file) result at 'NFS client' after changing >>> > /sys/block/sda/bdi/dirty_background_centisecs >>> > on *** NFS Server only - not on NFS Client **** >> > > Hi. Dave. > >> What is the configuration of the client and server? How much RAM, >> what their dirty_* parameters are set to, network speed, server disk >> speed for local sequential IO, etc? > these results are on ARM, 512MB RAM and XFS over NFS with default > writeback settings(only our writeback setting - dirty_background_cen > tisecs changed at nfs server only). Network speed is ~100MB/sec and Sorry, there is typo:) ^^100Mb/sec > local disk speed is ~25MB/sec. > >> >>> > --------------------------------------------------------------------- >>> > |WRITE Test with various 'dirty_background_centisecs' at NFS Server | >>> > --------------------------------------------------------------------- >>> > | | default = 0 | 300 centisec| 200 centisec| 100 centisec | >>> > --------------------------------------------------------------------- >>> > |RecSize | WriteSpeed | WriteSpeed | WriteSpeed | WriteSpeed | >>> > --------------------------------------------------------------------- >>> > |10485760 | 8.44MB/sec | 8.60MB/sec | 9.30MB/sec | 10.27MB/sec | >>> > | 1048576 | 8.48MB/sec | 8.87MB/sec | 9.31MB/sec | 10.34MB/sec | >>> > | 524288 | 8.37MB/sec | 8.42MB/sec | 9.84MB/sec | 10.47MB/sec | >>> > | 262144 | 8.16MB/sec | 8.51MB/sec | 9.52MB/sec | 10.62MB/sec | >>> > | 131072 | 8.48MB/sec | 8.81MB/sec | 9.42MB/sec | 10.55MB/sec | >>> > | 65536 | 8.38MB/sec | 9.09MB/sec | 9.76MB/sec | 10.53MB/sec | >>> > | 32768 | 8.65MB/sec | 9.00MB/sec | 9.57MB/sec | 10.54MB/sec | >>> > | 16384 | 8.27MB/sec | 8.80MB/sec | 9.39MB/sec | 10.43MB/sec | >>> > | 8192 | 8.52MB/sec | 8.70MB/sec | 9.40MB/sec | 10.50MB/sec | >>> > | 4096 | 8.20MB/sec | 8.63MB/sec | 9.80MB/sec | 10.35MB/sec | >>> > --------------------------------------------------------------------- >> >> While this set of numbers looks good, it's a very limited in scope. >> I can't evaluate whether the change is worthwhile or not from this >> test. If I was writing this patch, the questions I'd be seeking to >> answer before proposing it for inclusion are as follows.... >> >> 1. what's the comparison in performance to typical NFS >> server writeback parameter tuning? i.e. dirty_background_ratio=5, >> dirty_ratio=10, dirty_expire_centiseconds=1000, >> dirty_writeback_centisecs=1? i.e. does this give change give any >> benefit over the current common practice for configuring NFS >> servers? >> >> 2. what happens when you have 10 clients all writing to the server >> at once? Or a 100? NFS servers rarely have a single writer to a >> single file at a time, so what impact does this change have on >> multiple concurrent file write performance from multiple clients? >> >> 3. Following on from the multiple client test, what difference does it >> make to file fragmentation rates? Writing more frequently means >> smaller allocations and writes, and that tends to lead to higher >> fragmentation rates, especially when multiple files are being >> written concurrently. Higher fragmentation also means lower >> performance over time as fragmentation accelerates filesystem aging >> effects on performance. IOWs, it may be faster when new, but it >> will be slower 3 months down the track and that's a bad tradeoff to >> make. >> >> 4. What happens for higher bandwidth network links? e.g. gigE or >> 10gigE? Are the improvements still there? Or does it cause >> regressions at higher speeds? I'm especially interested in what >> happens to multiple writers at higher network speeds, because that's >> a key performance metric used to measure enterprise level NFS >> servers. >> >> 5. Are the improvements consistent across different filesystem >> types? We've had writeback changes in the past cause improvements >> on one filesystem but significant regressions on others. I'd >> suggest that you need to present results for ext4, XFS and btrfs so >> that we have a decent idea of what we can expect from the change to >> the generic code. >> >> Yeah, I'm asking a lot of questions. That's because the generic >> writeback code is extremely important to performance and the impact >> of a change cannot be evaluated from a single test. > Yes, I agree. > I will share patch behavior in gigabit Ethernet, different > filesystems(e.g. ext4, xfs and btrfs) and multiple NFS clients setup. > > Thanks. >> >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@xxxxxxxxxxxxx >> > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html