Re: CIFS data coherency problem

Pavel Shilovsky <piastryyy@xxxxxxxxx> · Fri, 10 Sep 2010 11:50:05 +0400

2010/9/10 Steve French <smfrench@xxxxxxxxx>:
> Surely it is a serious bug if a server doesn't update the mtime by the
> time the handle they used is closed.   If a client 1 does open/write/close,
> then client 2 does open/write/close, client 1 reopening the file should
> see the updated mtime.   If client 2 had not closed the file yet - it
> is not clear whether its write and mtime update will be processed
> first - but we shouldn't be using cached data in that case - client 1 should
> do an invalidate_mapping when it can't get an oplock (we do that already
> in seek and mmap via revalidate e.g.).  In any case writes won't be cached
> in that case - and the simplest change may be to invalidate the inode
> cached pages in this reopen path - when the mtime/size matches but we

I mean the situation then we have this file opened by another file handle. E.g.:

1) client1 opens file as f1.
2) client1 writes 'a' into f1 from the beginning.
3) client2 opens file as f2.
4) client2 writes 'x' into f2 to the beginning,
5) client1 opens file as f3.
6) client1 reads from f3 and get 'a'!!! But it must be 'x'!

I attached this test 'mtime_problem.py' and the capture
'mtime_problem.pcap'. On the capture you can see that client1 gets
from the server the same mtime for f3 as it was for f1 - so, the
server didn't apply mtime after client1 and client2 wrote the data to
the server.

> failed to get a read oplock.   Allowing us to turn off the 1 second timeout
> on metadata/data caching is already possible.

About LookupCacheEnabled, I turned it off when I was running my tests
cache_problem.py and mtime_problem.py but the results were the same.

>
>>> have two opens of the same file from different clients we won't be
>>> caching anyway.
>
> With no oplock we won't be caching writes - we do cache reads
> in some cases (the intent originally was to do this for about 1 second)
> but as you note we can cache longer
>

About writes:
           written = generic_file_aio_write(iocb, iov, nr_segs, pos);
           if (!CIFS_I(inode)->clientCanCacheAll)
                   filemap_fdatawrite(inode->i_mapping);
           return written;

If we don't have write oplock, we always return written value (got
after generic_file_aio_write), but there is no checks if we fails on
filemap_fdatawrite (e.g. if another client has this file opened and
obtains mandatory byte-range lock on it). So, a user always think that
his write is complete successfully, but it can be wrong!

There is another problem with Oplocks: The server sends Oplock break
notification into only one tcon provided by the client. But now in
CIFS code architecture we have no connections between several
connections to the same share - that's why we set Oplock to None only
on one tcon, but another leaves the same (with Oplock Level II). In
this case when we try to read by the second tcon from the file we
think that we have Oplock for reading and read from the cache - it
wrong!

The above situation you can see on the same capture
'mtime_problem.pcap': the server sends Oplock break to None Oplock
only to the FID 0x0002, but no requests for FID 0x0001!

-- 
Best regards,
Pavel Shilovsky.
Attachment:
mtime_problem.py

Description: Binary data
Attachment:
mtime_problem.pcap

Description: Binary data