On Mon, 14 Feb 2022 at 22:07, Xavier Roche <xavier.roche@xxxxxxxxxxx> wrote: > > There has been a longstanding race condition between vfs_rename and do_linkat, > when those operations are done in parallel: > > 1. Moving a file to an existing target file (eg. mv file target) > 2. Creating a link from the target file to a third file (eg. ln target link) > > A typical example would be (1) a regular process putting a new version > of a database in place and (2) a regular process backuping the live > database by hardlinking it. > > My understanding is that as the target file is never erased on client > side, but just replaced, the link should never fail. > > The issue seem to lie inside vfs_link (fs/namei.c): > inode_lock(inode); > /* Make sure we don't allow creating hardlink to an unlinked file */ > if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) > error = -ENOENT; > > The possible answer is that the inode refcount is zero because the > file has just been replaced concurrently, old file being erased, and > as such, the link operation is failing. > > The race appears to have been introduced by aae8a97d3ec30, to fix > _another_ race between unlink and link (but I'm not sure to understand > what were the implications). > > Reverting the inode->i_nlink == 0 section "fixes" the issue, but would > probably reintroduce this another issue. > > At this point I don't know what would be the best way to fix this issue. Doing "lock_rename() + lookup last components" would fix this race. If this was only done on retry, then that would prevent possible performance regressions, at the cost of extra complexity. Thanks, Miklos