Re: [RFC] metadata updates vs. fetches (was Re: [PATCH v4] fs: Fix data race in inode_set_ctime_to_ts)

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 24 Nov 2024 17:02:21 -0800

On Sun, 24 Nov 2024 at 16:53, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> and look how the reader is happy, because it got the same nanoseconds
> twice. But the reader thinks it had a time of 6.000950, and AT NO
> POINT was that actually a valid time.

And let me clarify again that I don't actually personally think we
should care deeply about any of this.

I think the race is entirely theoretical and doesn't happen in
practice (regardless of the "re-read nsec", which I don't think
works), and I don't think anybody cares that deeply about nanoseconds
anyway.

The "worst" that would happen is likely that some cache that depended
on timestamps would get invalidated, there really aren't a ton of
things that depend on exact time outside of that.

Also, fixing it with the sequence count should be fairly low-cost, but
is not entirely free.

I wouldn't worry about the writer side, but the reader side ends up
with a smp_read_acquire() on the first sequence count read, and a
smp_rmb() before the final sequence count read.

That's free on x86 where all reads are ordered (well, "free" - you
still need to actually do the sequence count read, but it is hopefully
in the same cacheline as the hot data), but smp_rmb() can be fairly
expensive on other architectures.

              Linus