Re: FLushing cached writes in nfs_getattr() and stat() delay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Nov 6, 2008, at Nov 6, 2008, 2:22 PM, Alex Sidorenko wrote:
On November 6, 2008 01:49:56 pm Trond Myklebust wrote:
On Thu, 2008-11-06 at 10:34 -0500, Alex Sidorenko wrote:
I understand the reasoning behind that. From application point of view,
NFS file/directory should behave the same as on local FS. If we have
queued many writes, without this patch stat() will return incorrect
results, both for mtime and file length. Some applications may depend on
stat() results being correct.

At the same time, the fact that we have to wait forever while copying big files and doing 'ls -l' on that directory (or on the file being written)
is not very good either (two HP customers have complained about this
after migrating from RHEL4 to RHEL5).

In order to relax that requirement, we'd have to introduce some
mechanism for the application to notify the filesystem that they don't
care about strictly correct c/mtimes. As you noted above, returning
incorrect mtimes may trip up some applications (backup applications, and mail readers are a couple of business critical cases that come to mind).

The problem is still there in 2.6.27. I am not sure what can be done to both reduce the stat() delay and guarantee reasonable stat() results.

It is interesting that with 'noac' stat() returns much faster (just 1-3s
delay).

That would be because 'noac' enforces synchronous writes. If you don't
care about the degraded write performance, you can do the same thing
without all the extra getattr clutter that noac introduces, by simply
mounting with -osync.

Hi Trond,

In my experiments on 2.6.24 I saw practically no performance degradation while doing 'cp' of a 4Gb file with 'noac', with 'sync' the performance is really
bad. And writes are still definitely ASYNC, here is what I see using
Systemtap script on entry to rpc_execute

There's a difference between an asynchronous RPC request, and an asynchronous write request.

An async RPC means the process doesn't wait for the request to finish, it can perform other housekeeping.

An async write means that the client delays sending NFS writes, maintaining the dirty data in its memory. It can send the NFS write requests by means of an async RPC if it wishes. A synchronous write means that the client will block the application until the server has replied that the dirty data is on the server's disk.

from /etc/mtab:

cats:/data /mnt nfs rw,udp,noac,hard,intr,addr=192.168.0.33 0 0

$ dd if=/dev/zero of=/mnt/win/big bs=100m count=1

From stap output:
rpc_execute p_proc=7 WRITE qlen=0 prio=1 flags=0x1
--ts=4
rpc_execute p_proc=7 WRITE qlen=0 prio=1 flags=0x1
...

So we still have RPC_TASK_ASYNC set.

See above.

I did not check experimentally 'noac' on 2.6.27 but I still think that 'noac' does not make writes sync. nfs_commit_rpcsetup() still sets RPC_TASK_ASYNC by
default and I don't see NFS_MOUNT_NOACL setting FLUSH_SYNC anywhere.

Again, RPC_TASK_ASYNC has nothing to do with whether the application is blocked until the server says the write is permanent.

So I still don't quite understand why 'noac' eliminates the delay. Chuck Lever
says that "noac" never caches writes on the client. Printing
xprt->backlog->qlen in my experiments I can still see a significant backlog
even with 'noac', e.g.

--ts=32
rpc_execute p_proc=7 WRITE qlen=3086 prio=1 flags=0x1

but 'stat' delay is just 1-2s.

Regards,
Alex

--
------------------------------------------------------------------
Alexandre Sidorenko             email: asid@xxxxxx
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux