On Sat, 2020-09-26 at 08:57 +0800, Xiaoxi Chen wrote: > Hi Jeff, > > Yes Step 5 is where the issue is. Client A (Fuse) should but > not do a synchronous read to OSD since the old data from Step 3 still > in its pagecache. This might be the issue of fuse > (https://libfuse.github.io/doxygen/notify__inval__inode_8c.html) and > kernel driver doesnt have this issue, but it would be great if you > can share how kernel driver interacting with pagecache? especially > without Fc > > -Xiaoxi > Yeah, it seems like when you lose Fc caps, then you need to invalidate the pagecache. FUSE has an upcall for that, but it looks like it's done asynchronously. I suppose a read could race in before that happens. The right thing to do is probably to not let the FUSE client code return Fc caps back to the MDS until the pagecache is invalidated. In the kernel, without Fc, read() syscalls (and similar) don't go through the pagecache at all. ceph_read_iter/write_iter will dispatch I/O to the OSDs directly and the results are not cached. None of this behaves very well with mmap, btw. We sort of _have_ to go through the pagecache for mmap. For that, you probably ought to make sure you're using some sort of locking if you want to do this sort of I/O pattern across clients. > Jeff Layton <jlayton@xxxxxxxxxx> 于2020年9月25日周五 下午8:02写道: > > I'm less familiar with the fuse client than the kernel one, but this > > sounds wrong. > > > > In step 5, Client A should just do a synchronous read from the OSD since > > it no longer has Fc caps. Why is it seeing old data? Has Client B just > > not yet sent issued the write to the OSD? If so, was Client B issued Fb > > caps? > > > > -- Jeff > > > > On Thu, 2020-09-24 at 15:34 +0800, Xiaoxi Chen wrote: > > > Could you explain why client can add page cache later? Please > > > correct where it is wrong. > > > > > > 1. Client A has page cache of file X > > > 2. Client B open X for write, it will take write lock and MDS > > > will revoke Fc of Client A, which will result in Client A drop its > > > cache. > > > 3. Client A try to read X, Client A can go ahead to read from > > > OSD, which gets the old data. (will clinet A issue a getattr to MDS? > > > will the getattr been blocked? I see some discussion pointing to > > > https://github.com/ukernel/ceph/commit/7db1563416b5559310dbbc834795b83a4ccdaab4) > > > 4. Client B writes data > > > 5. Client A still get old data. > > > > > > -Xiaoxi > > > > > > Yan, Zheng <ukernel@xxxxxxxxx> 于2020年9月24日周四 上午11:50写道: > > > > On Thu, Sep 24, 2020 at 11:07 AM Xiaoxi Chen <superdebuger@xxxxxxxxx> wrote: > > > > > Hi zheng, > > > > > We are seeing inconsistent among clients ones one client update a file(by scp), some of the node see new contents but some of the nodes don't. The inconsistent can last 30mins to a few hours and fix by its own. I think it should because some of the node not dropping the page cache properly. > > > > > Looking into the code I see when Fc cap revoke, fuse client drop objectcache , queue a task to finisher thread to do fuse_lowlevel_notify_inval_inode, then ack the cap revoke. So seems there is a window between the cap-revoke-ack , and the final fuse_lowlevel_notify_inval_inode finished, in this window page cache still valid and user can read stale data. Though it is strange that the window can be that large(no pg issue during the window). > > > > > Could you please confirm if this is the real problems and why it is implemented in this way? > > > > > > > > > > > > > yes, it's real problem. fuse_lowlevel_notify_inval_inode() does not > > > > prevent client add page cache later. If there are multiple fuse > > > > clients read/modify same file, you'd better to set > > > > fuse_disable_pagecache option to true. > > > > > > > > > -xiaoxi > > > _______________________________________________ > > > Dev mailing list -- dev@xxxxxxx > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > -- Jeff Layton <jlayton@xxxxxxxxxx> _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx