readahead for strided IO

"Scheffenegger, Richard" <rs@xxxxxxxxxx> · Wed, 29 Apr 2015 07:52:47 +0000

Hi,

I hope that you could help me out. We are currently investigating a performance issue involving a NFSv3 server (our applicance), and a Linux application doing IO against it.

The IO pattern are strictly sequential, but strided reads: the application requests 4k, skips 4k, reads 4k, skips 4k, … in a monotonic increasing pattern, and apparently using blocking read() calls. Unfortunately, I don’t
 know exactly, if the file handle was created using O_RDONLY or O_RDWR, and O_DIRECT or O_SYNC were specified.

As you can imagine, the RTT overhead (10s of usec per IO) of individual 4k NFS reads, which are issued by the NFS client only once the application actually requests them, is a severe limitation in terms of IOPS  (bandwidth
 is around 25-30MB/s, IOPS around 7000), even though the storage system / NFS server is detecting the strided reads and serving them directly from it’s pre-fetch cache (few usec latency there).

Complicating the issue is that the application behaving so inefficient is closed source. The best approaches would obviously be for the application to request larger blocks of data and once in application memory, discard
 about half of it (the strides are broken every ~20-30 IOs, and interspaced with 16k reads, followed by strided reads aligned to the other odd/even 4k block offsets in the file), or to explicitly make use of the readahead() facility of linux.

The reason I write this is my curiosity, if there would be any way to configure the linux readahead facitily to be really aggressive on a particular nfs mount; we checked the /sys/class/bdi settings for the mount in question,
 and increased the read_ahead_kb, but that didn’t change anything; I guess what would be necessary was a flag to have mm/readahead kick in for every read, regardless if it’s considered a sequential read or not…

Finally, are there ways to extract statistical information from mm/readahead, ie. if it was actually called (not that due to some flags used by the application, it’s completely bypassed to begin with), and when/why/how
 it decided to do the IO (or not) it does?

Thanks a lot!

Richard Scheffenegger
Storage Infrastructure Architect

NetApp Austria GmbH
+43 676 6543146 Tel
+43 1 3676811-3100 Fax
rs@xxxxxxxxxx
www.netapp.at

Die neue Vision
 des Cloud-

Datenmanagements

Facebook

Twitter

YouTube