Re: A NFS client partial file corruption problem in recent/current kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> >  If a client kernel has cached pages this way, is there any simple
> > sequence of system calls on the client that will cause it to discard
> > these cached pages? Or do you need the file's GETATTR to change again,
> > implicitly from another machine? (I assume that changing the file's
> > attributes from the client with the cached pages doesn't cause it to
> > invalidate them, and certainly eg a 'touch' doesn't do it from the
> > client where it does do it from another machine.)
> 
> There are 2 ways to manipulate the page cache directly on the client:
>    1. You can clear out the entire page cache as the 'root' user, with the
>       /proc/sys/vm/drop_caches interface (see 'man 5 proc').
>    2. Alternatively, you can use posix_fadvise() with the
>       POSIX_FADV_DONTNEED flag to clear out only the pages that you think
>       are bad. Make sure to first fsync() so that the pages don't get
>       pinned in memory by virtue of being dirty (see 'man 2 fadvise64').

 I just did some experiments, and on the Ubuntu 18.04 LTS version of
4.15.0, it appears that flock()'ing the file before re-reading it will
cause the kernel to not manifest the problem. I don't seem to have to
flock() the file initially when I read it before the change, and it's
sufficient to use LOCK_SH instead of LOCK_EX. (And I do have to flock()
after the change, otherwise I still see the problem even if I flock()
before.)

 Is this a supported/guaranteed behavior, or is it just lucky coincidence
that things currently work this way, much like it was happenstance
instead of design that things worked back in the 4.4.x era?

 It would be very convenient for us if flock() works around this,
because it turns out that the only reason Alpine is not flock()'ing
files is that it has an ancient 'do not use flock on Linux NFS' piece of
code deep inside it that was apparently there to work around a bug that
seems to have been fixed a decade or so ago:

   http://repo.or.cz/alpine.git/blob/HEAD:/imap/src/osdep/unix/flocklnx.c

	- cks



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux