On 12/02/18 17:35, Terry Barnaby wrote:
On 12/02/18 17:15, J. Bruce Fields wrote:
On Mon, Feb 12, 2018 at 05:09:32PM +0000, Terry Barnaby wrote:
One thing on this, that I forgot to ask, doesn't fsync() work
properly with
an NFS server side async mount then ?
No.
If a server sets "async" on an export, there is absolutely no way for a
client to guarantee that data reaches disk, or to know when it happens.
Possibly "ignore_sync", or "unsafe_sync", or something else, would be a
better name.
--b.
Well that seems like a major drop off, I always thought that fsync()
would work in this case. I don't understand why fsync() should not
operate as intended ? Sounds like this NFS async thing needs some work !
I still do not understand why NFS doesn't operate in the same way as a
standard mount on this. The use for async is only for improved
performance due to disk write latency and speed (or are there other
reasons ?)
So with a local system mount:
async: normal mode: All system calls manipulate in buffer memory disk
structure (inodes etc). Data/Metadata is flushed to disk on fsync(),
sync() and occasionally by kernel. Processes data is not actually
stored until fsync(), sync() etc.
sync: with sync option. Data/metadata is written to disk before system
calls return (all FS system calls ?).
With an NFS mount I would have thought it should be the same.
async: normal mode: All system calls manipulate in buffer memory disk
structure (inodes etc) this would normally be on the server (so
multiple clients can work with the same data) but with some options
(particular usage) maybe client side write buffering/caching could be
used (ie. data would not actually pass to server during every FS
system call). Data/Metadata is flushed to server disk on fsync(),
sync() and occasionally by kernel (If client side write caching is
used flushes across network and then flushes server buffers).
Processes data is not actually stored until fsync(), sync() etc.
sync: with client side sync option. Data/metadata is written across
NFS and to Server disk before system calls return (all FS system calls
?).
I really don't understand why the async option is implemented on the
server export although a sync option here could force sync for all
clients for that mount. What am I missing ? Is there some good reason
(rather than history) it is done this way ?
Just tried the use of fsync() with an NFS async mount, it appears to
work. With a simple 'C' program as a test program I see the following
data rates/times when the program writes 100 MBytes to a single file
over NFS (open, write, write .., fsync) followed by close (after the
timing):
NFS Write multiple small files 0.001584 ms/per file 0.615829 MBytes/sec
CpuUsage: 3.2%
Disktest: Writing/Reading 100.00 MBytes in 1048576 Byte Chunks
Disk Write sequential data rate fsync: 1 107.250685 MBytes/sec CpuUsage:
13.4%
Disk Write sequential data rate fsync: 0 4758.953878 MBytes/sec
CpuUsage: 66.7%
Without the fsync() call the data rate is obviously to buffers and with
the fsync() call it definitely looks like it is to disk.
Interestingly, it appears, that the close() call actually does an
effective fsync() as well as the close() takes an age when fsync() is
not used.
(By the way just go bitten by a Fedora27 KDE/plasma/NetworkManager
change that sets the Ethernet interfaces of all my systems to 100
MBits/s half duplex. Looks like the ability to configure Ethernet auto
negotiation has been added and the default is fixed 100 MBits/s half
duplex !)
Basic test code (just the write function):
void nfsPerfWrite(int doFsync){
int f;
char buf[bufSize];
int n;
double st, et, r;
int nb;
int numBuf;
CpuStat cpuStatStart;
CpuStat cpuStatEnd;
double cpuUsed;
double cpuUsage;
sync();
f = open64(fileName, O_RDWR | O_CREAT, 0666);
if(f < 0){
fprintf(stderr, "Error creating %s: %s\n", fileName,
strerror(errno));
return;
}
sync();
cpuStatGet(&cpuStatStart);
st = getTime();
for(n = 0; n < diskNum; n++){
if((nb = write(f, buf, bufSize)) != bufSize)
fprintf(stderr, "WriteError: %d\n", nb);
}
if(doFsync)
fsync(f);
et = getTime();
cpuStatGet(&cpuStatEnd);
cpuStatEnd.user = cpuStatEnd.user - cpuStatStart.user;
cpuStatEnd.nice = cpuStatEnd.nice - cpuStatStart.nice;
cpuStatEnd.sys = cpuStatEnd.sys - cpuStatStart.sys;
cpuStatEnd.idle = cpuStatEnd.idle - cpuStatStart.idle;
cpuStatEnd.wait = cpuStatEnd.wait - cpuStatStart.wait;
cpuStatEnd.hi = cpuStatEnd.hi - cpuStatStart.hi;
cpuStatEnd.si = cpuStatEnd.si - cpuStatStart.si;
cpuUsed = (cpuStatEnd.user + cpuStatEnd.nice + cpuStatEnd.sys +
cpuStatEnd.hi + cpuStatEnd.si);
cpuUsage = cpuUsed / (cpuUsed + cpuStatEnd.idle);
r = (double(diskNum) * bufSize) / (et - st);
printf("Disk Write sequential data rate fsync: %d %f MBytes/sec
CpuUsage: %.1f%\n", doFsync, r / (1024*1024), cpuUsage * 100);
close(f);
}
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx