inconsistent file content after killing nfs daemon

eric.chacron@vz.cit.alcatel.fr (eric chacron) · Wed, 16 Jan 2002 10:36:25 +0100

--------------4D54FCED2D7874743D1F6ADE
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by mail.redhat.com id g0G9aTC21222

Hi,

I look at the same problem in synchronous mode now, on Linux nfs source b=
asis.

Even in synchronous mode (O_SYNC ...), it seems the nfs client sends as m=
any
write request to the server as the user data is splitted into cache pages=
 on
client side (ref.  nfs_writepage_sync() , nfs_writepage() in fs/nfs/write=
.c
,nfs_updatepage() in fs/nfs/write.c , nfs_commit_write() in fs/nfs/file.c=
 ,
generic_file_write() in mm/filemap.c , nfs_file_write in fs/nfs/file.c)

For instance if i request a synchronous write of 16 bytes at offset PAGE_=
SIZE - 8
of my file, i think nfs client
will send two WRITE messages to the server. Even if it uses "stable =3D
NFS_FILE_SYNC"  for these two messages,
a failure of the server can occur after the first one has been writen by =
ext3 on
stable storage and not the second one.
Then, if the client restart upon server failure (this is the case in some
project) the file is found with
only 8 bytes updated instead of 16.

I propose the following conditions to provide atomicity of write through =
nsf +
ext3, with the current implementation:
- ext3 journaled mode
- wsize mount option >=3D PAGE_SIZE
- O_SYNC on open()
- data size < =3D PAGE_SIZE
- file offset (PAGE_SIZE) + data-size <=3D PAGE_SIZE

Another possibility can be to modify the nfs implementation to have only =
one
WRITE message when the
the total size in less than wsize (whatever the number of cache pages use=
d) ?

Regards,
Eric

"Stephen C. Tweedie" a =E9crit :

> Hi,
>
> On Fri, Jan 11, 2002 at 10:37:37AM +0100, eric chacron wrote:
>
> > To answer your question, the problem seems to be reproductible only i=
n
> > asynchronous mode (without O_SYNC).
> > I have reproduced the case ( without O_SYNC) using different record s=
izes:
> > from 1 K to 64 K, but not with 512 bytes.
> > It makes sense that the the zeroed holes in the file are caused by th=
e nfs
> > client absence of serialisation/ ordering as the file is used in exte=
nsion.
> > With O_SYNC i haven't reproduced the same problem for the moment.
>
> Right --- that's standard unix semantics for writeback.  Writes to
> backing store are completely unordered unless you request ordering
> with O_SYNC or f[data]sync.
>
> Cheers,
>  Stephen
>
> _______________________________________________
> 
> Ext3-users@redhat.com
> https://listman.redhat.com/mailman/listinfo/ext3-users

--------------4D54FCED2D7874743D1F6ADE
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
Hi,
I look at the same problem in synchronous mode now, on Linux nfs source
basis.&nbsp;&nbsp;
Even in synchronous mode (O_SYNC ...), it seems the nfs client sends
as many write request to the server as the user data is splitted into cache
pages on client side (ref.&nbsp; nfs_writepage_sync() , nfs_writepage()
in fs/nfs/write.c ,nfs_updatepage() in fs/nfs/write.c , nfs_commit_write()
in fs/nfs/file.c , generic_file_write() in mm/filemap.c , nfs_file_write
in fs/nfs/file.c)
For instance if i request a synchronous write of 16 bytes at offset
PAGE_SIZE - 8 of my file, i think nfs client
 will send two WRITE messages to the server. Even if it uses "stable
= NFS_FILE_SYNC"&nbsp; for these two messages,
 a failure of the server can occur after the first one has been writen
by ext3 on stable storage and not the second one.
 Then, if the client restart upon server failure (this is the case in
some project) the file is found with
 only 8 bytes updated instead of 16.
I propose the following conditions to provide atomicity of write through
nsf + ext3, with the current implementation:&nbsp;
 - ext3 journaled mode
 - wsize mount option >= PAGE_SIZE
 - O_SYNC on open()&nbsp;
 - data size &lt; = PAGE_SIZE
 - file offset (PAGE_SIZE) + data-size &lt;= PAGE_SIZE
 &nbsp;
 Another possibility can be to modify the nfs implementation to have
only one WRITE message when the
 the total size in less than wsize (whatever the number of cache pages
used) ?&nbsp;
Regards,
 Eric
 &nbsp;
"Stephen C. Tweedie" a &eacute;crit :
<blockquote TYPE=CITE>Hi,
On Fri, Jan 11, 2002 at 10:37:37AM +0100, eric chacron wrote:
> To answer your question, the problem seems to be reproductible only
in
 > asynchronous mode (without O_SYNC).
 > I have reproduced the case ( without O_SYNC) using different record
sizes:
 > from 1 K to 64 K, but not with 512 bytes.
 > It makes sense that the the zeroed holes in the file are caused by
the nfs
 > client absence of serialisation/ ordering as the file is used in
extension.
 > With O_SYNC i haven't reproduced the same problem for the moment.
Right --- that's standard unix semantics for writeback.&nbsp; Writes
to
 backing store are completely unordered unless you request ordering
 with O_SYNC or f[data]sync.
Cheers,
 &nbsp;Stephen
_______________________________________________
 
 Ext3-users@redhat.com
 <a href="https://listman.redhat.com/mailman/listinfo/ext3-users";>https://listman.redhat.com/mailman/listinfo/ext3-users</a></blockquote>
</html>

--------------4D54FCED2D7874743D1F6ADE--