Hi Chuck, On Wed, 2012-02-08 at 10:40 -0500, Chuck Lever wrote: > On Feb 8, 2012, at 2:43 AM, Harshula wrote: > > Could you please expand on the subtleties involved that require an > > application to be rewritten if forcedirectio mount option was available? > > > > A scenario where forcedirectio would be useful is when an application > > reads nearly a TB of data from local disks, processes that data and then > > dumps it to an NFS mount. All that happens while other processes are > > reading/writing to the local disks. The application does not have an > > O_DIRECT option nor is the source code available. > > > > With paged I/O the problem we see is that the NFS client system reaches > > dirty_bytes/dirty_ratio threshold and then blocks/forces all the > > processes to flush dirty pages. This effectively 'locks' up the NFS > > client system while the NFS dirty pages are pushed slowly over the wire > > to the NFS server. Some of the processes that have nothing to do with > > writing to the NFS mount are badly impacted. A forcedirectio mount > > option would be very helpful in this scenario. Do you have any advice on > > alleviating such problems on the NFS client by only using existing > > tunables? > > Using direct I/O would be a work-around. The fundamental problem is > the architecture of the VM system, and over time we have been making > improvements there. > > Instead of a mount option, you can fix your application to use direct > I/O. Or you can change it to provide the kernel with (better) hints > about the disposition of the data it is generating (madvise and > fadvise system calls). (On Linux we assume you have source code and > can make such changes. I realize this is not true for proprietary > applications). > > You could try using the "sync" mount option to cause the NFS client to > push writes to the server immediately rather than delaying them. This > would also slow down applications that aggressively dirties pages on > the client. > > Meanwhile, you can dial down the dirty_ratio and especially the > dirty_background_ratio settings to trigger earlier writeback. We've > also found increasing min_free_bytes has positive effects. The exact > settings depend on how much memory your client has. Experimenting > yourself is pretty harmless, so I won't give exact settings here. Thanks for the reply. Unfortunately, not all vendors provide the source code, so using O_DIRECT or fsync is not always an option. Lowering dirty_bytes/dirty_ratio and dirty_background_bytes/dirty_background_ratio did help as it smoothed out the data transfer over the wire by pushing data out to the NFS server sooner. Otherwise, I was seeing the data transfer over the wire having idle periods while >10GiB of pages were being dirtied by the processes, then congestion as soon as the dirty_ratio was reached and the frantic flushing of dirty pages to the NFS server. However, modifying dirty_* tunables has a system-wide impact, hence it was not accepted. The "sync" option, depending on the NFS server, may impact the NFS server's performance when serving many NFS clients. But still worth a try. The other hack that seems to work is periodically triggering an nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the NFS server. Not exactly elegant ... Thanks, # -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html