On Thu, Feb 20, 2025 at 5:00 AM Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > @Sam could you please describe your reproducer? Absolutely. We have an internal networked filesystem that implements the FUSE interface – not CVMFS. So stat, readlink, etc end up as RPCs to another backend. We need to avoid stale readlink calls, so we clear the kernel symlink cache whenever we receive a new snapshot from the network, and this is where the race condition comes in. I reproduced the bug by interacting with the same filesystem location on two different machines. On the first machine, we have a C for loop that calls readlink and prints the destination whenever it changes[1]. On the second machine, I manually switched the symlink back and forth between two destinations of different lengths using `ln -sf`. When the kernel cache was enabled, changing the link destination from "dest" to "longerdest" would result in the first machine printing "long". It happened very consistently, usually immediately or with 1 or 2 tries. Here are the things that fixed the bug: - Disabling the kernel cache - Applying Miklos' patch to a custom kernel - Uncommenting the 1 second sleep in [1] to make the race condition very unlikely I hope that helps! [1] basically the script seen here: https://github.com/cvmfs/cvmfs/issues/3626#issue-2390818866 Sam