Re: race between vfs_rename and do_linkat (mv and link)

Miklos Szeredi <miklos@xxxxxxxxxx> · Tue, 15 Feb 2022 10:56:29 +0100

On Mon, 14 Feb 2022 at 22:07, Xavier Roche <xavier.roche@xxxxxxxxxxx> wrote:
>
> There has been a longstanding race condition between vfs_rename and do_linkat,
> when those operations are done in parallel:
>
> 1. Moving a file to an existing target file (eg. mv file target)
> 2. Creating a link from the target file  to a third file (eg. ln target link)
>
> A typical example would be (1) a regular process putting a new version
> of a database in place and (2) a regular process backuping the live
> database by hardlinking it.
>
> My understanding is that as the target file is never erased on client
> side, but just replaced, the link should never fail.
>
> The issue seem to lie inside vfs_link (fs/namei.c):
>        inode_lock(inode);
>        /* Make sure we don't allow creating hardlink to an unlinked file */
>        if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
>                error =  -ENOENT;
>
> The possible answer is that the inode refcount is zero because the
> file has just been replaced concurrently, old file being erased, and
> as such, the link operation is failing.
>
> The race appears to have been introduced by aae8a97d3ec30, to fix
> _another_ race between unlink and link (but I'm not sure to understand
> what were the implications).
>
> Reverting the inode->i_nlink == 0 section "fixes" the issue, but would
> probably reintroduce this another issue.
>
> At this point I don't know what would be the best way to fix this issue.

Doing "lock_rename() + lookup last components" would fix this race.
If this was only done on retry, then that would prevent possible
performance regressions, at the cost of extra complexity.

Thanks,
Miklos