Patrick Steinhardt <ps@xxxxxx> writes: > This should address the race in a POSIX-compliant way. The only real > downside is that this mechanism cannot be used on non-POSIX-compliant > systems like Windows. But we at least have the second-level caching > mechanism in place that compares contents of "files.list" with the > currently loaded list of tables. OK. > + /* > + * Cache stat information in case it provides a useful signal to us. > + * According to POSIX, "The st_ino and st_dev fields taken together > + * uniquely identify the file within the system." That being said, > + * Windows is not POSIX compliant and we do not have these fields > + * available. So the information we have there is insufficient to > + * determine whether two file descriptors point to the same file. > + * > + * While we could fall back to using other signals like the file's > + * mtime, those are not sufficient to avoid races. We thus refrain from > + * using the stat cache on such systems and fall back to the secondary > + * caching mechanism, which is to check whether contents of the file > + * have changed. OK. > + * > + * On other systems which are POSIX compliant we must keep the file > + * descriptor open. This is to avoid a race condition where two > + * processes access the reftable stack at the same point in time: > + * > + * 1. A reads the reftable stack and caches its stat info. > + * > + * 2. B updates the stack, appending a new table to "tables.list". > + * This will both use a new inode and result in a different file > + * size, thus invalidating A's cache in theory. > + * > + * 3. B decides to auto-compact the stack and merges two tables. The > + * file size now matches what A has cached again. Furthermore, the > + * filesystem may decide to recycle the inode number of the file > + * we have replaced in (2) because it is not in use anymore. > + * > + * 4. A reloads the reftable stack. Neither the inode number nor the > + * file size changed. If the timestamps did not change either then > + * we think the cached copy of our stack is up-to-date. > + * > + * By keeping the file descriptor open the inode number cannot be > + * recycled, mitigating the race. > + */ This is nasty. Well diagnosed and fixed. Will queue. Thanks.