Re: Very slow O_DIRECT writes on NFS in .36

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2010-11-18 at 12:23 -0800, Ben Greear wrote:
> On 11/18/2010 12:17 PM, Chuck Lever wrote:
> >
> > On Nov 18, 2010, at 3:07 PM, Ben Greear wrote:
> >
> >> I applied the NFS O_DIRECT patch (and all others) from the pending 2.6.36 stable
> >> queue, and now I can at least use O_DIRECT w/out immediate failure.
> >>
> >> However, I notice that when writing 2k chunks with O_DIRECT on
> >> NFS, it runs extremely slowly (about 300Kbps throughput).  The
> >> server is a Fedora 13 64-bit system running 2.6.34.7-56.fc13.x86_64
> >>
> >> Here's some strace -ttT output for the writer:
> >>
> >> 07:03:42.898058 write(9, "\370'\37\345v\230\315\253\3\0\0\0\354\7\0\0\16\1\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 2048) = 2048<0.059402>
> >> 07:03:42.957649 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000266>
> >> 07:03:42.958148 write(9, "\212$s\327v\230\315\253\3\0\0\0\354\7\0\0\17\1\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 2048) = 2048<0.069295>
> >> 07:03:43.027524 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000011>
> >>
> >>
> >> Writing 64k chunks takes basically the same amount of time per system call:
> >>
> >> 07:06:13.537488 write(9, "\5\340\202\262v\230\315\253\3\0\0\0\354\377\0\0\6\0\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 65536) = 65536<0.049462>
> >> 07:06:13.587083 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000035>
> >> 07:06:13.587410 write(9, "\250\231\377cv\230\315\253\3\0\0\0\354\377\0\0\7\0\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 65536) = 65536<0.058612>
> >> 07:06:13.646233 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000095>
> >> 07:06:13.646616 write(9, "\5-@\5v\230\315\253\3\0\0\0\354\377\0\0\10\0\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 65536) = 65536<0.050282>
> >>
> >>
> >> Reading is a good deal faster..about 34Mbps with O_DIRECT, NFS and 2k reads.
> >>
> >> Any ideas about why the write performance is so bad?
> >
> > A network trace will probably show you that the per-write latency is due to the server.
> >
> 
> Looks like you are right.  I don't remember it being this slow before, but
> maybe it was.  We'll run some tests with older kernels and/or different
> servers.
> 
>    6.700193 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:96256 Len:1024 FILE_SYNC
>    6.740547 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12321 Ack=101729 Win=501 Len=0 TSV=218471603 TSER=1385525
>    6.769380 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 262) Len:1024 FILE_SYNC
>    6.769609 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:97280 Len:1024 FILE_SYNC
>    6.809777 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12461 Ack=102885 Win=501 Len=0 TSV=218471673 TSER=1385594
>    6.850373 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 265) Len:1024 FILE_SYNC
>    6.850631 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:98304 Len:1024 FILE_SYNC
>    6.890845 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12601 Ack=104041 Win=501 Len=0 TSV=218471754 TSER=1385675
>    6.930344 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 268) Len:1024 FILE_SYNC
>    6.930703 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:99328 Len:1024 FILE_SYNC
>    6.971753 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12741 Ack=105197 Win=501 Len=0 TSV=218471834 TSER=1385755
>    6.980341 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 271) Len:1024 FILE_SYNC

This is all expected. Doing direct i/o with small chunks is always going
to be slow, since the kernel is required to ensure all the data has been
synchronised to disk before the syscall can complete.

You can significantly speed things up by grouping several writes into a
single syscall using the vectored write interface (i.e. writev()). That
allows the kernel to use unstable writes to the server and then issue a
single COMMIT call to sync all the data from the vector at once.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux