Re: 9p: fscache duplicate cookie

Dominique Martinet <asmadeus@xxxxxxxxxxxxx> · Wed, 12 May 2021 21:26:02 +0900

Luis Henriques wrote on Wed, May 12, 2021 at 12:58:58PM +0100:
> <...>-20591   [000] ...2    67.538644: fscache_cookie: GET prn c=000000003080d900 u=50 p=0000000042542ee5 Nc=48 Na=1 f=22
> <...>-20591   [000] ...1    67.538645: fscache_acquire: c=0000000011fa06b1 p=000000003080d900 pu=50 pc=49 pf=22 n=9p.inod
> <...>-20599   [003] .N.2    67.542180: 9p_fscache_cookie: v9fs_drop_inode cookie: 0000000097476aaa
> [...]
>
> So, this is... annoying, I guess.

Oh, this actually looks different from what I had in mind.

So if I'm reading this right, the dup acquire happens before drop on
another thread, meaning iget5_locked somehow returned an inode with
I_NEW on same i_ino than that of the inode that is dropped later?...

How much trust can we actually put in trace ordering off different cpus?
My theory would really have wanted just that drop before the acquire :D

Anyway, I think there's no room for doubt that it's possible to get a
new inode for the same underlying file before the evict finished; which
leaves room for a few questions:
 - as David brought up on IRC (#linuxfs@OFTC), what about the flushing
of dirty data that happens in evict()? wouldn't it be possible for
operations on the new inode to read stale data while the old inode is
being flushed? I think that warrants asking someone who understands this
better than me as it's probably not 9p specific even if 9p makes it
easier to get a new inode in such a racy way...

 - for 9p in particular, Christian Schoenebeck (helping with 9p in qemu)
brought up that we evict inodes too fast too often, so I think it'd help
to have some sort of inode lifetime management and keep inodes alive for
a bit.
As a network filesystem with no coherency built in the protocol I don't
think we can afford to keep inodes cached too long, and I know some
servers have troubles if we keep too many fids open, but it would be
nice to have a few knobs to just keep inodes around a bit longer... This
won't solve the fundamental problem but if the inode isn't evicted at a
point where it's likely to be used again then this particular problem
should be much harder to hit (like other filesystems, actually :P)

I'm not sure how that works though, and won't have much time to work on
it short term anyway, but it's an idea :/

-- 
Dominique