Re: [PATCH] vfs: allow using kernel buffer during fiemap operation

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Thu, 20 Apr 2023 22:11:03 +0100

On Thu, Apr 20, 2023 at 10:00:45PM +0100, Al Viro wrote:
> On Mon, Apr 17, 2023 at 11:03:26PM +0900, Tetsuo Handa wrote:
> > syzbot is reporting circular locking dependency between ntfs_file_mmap()
> > (which has mm->mmap_lock => ni->ni_lock => ni->file.run_lock dependency)
> > and ntfs_fiemap() (which has ni->ni_lock => ni->file.run_lock =>
> > mm->mmap_lock dependency), for commit c4b929b85bdb ("vfs: vfs-level fiemap
> > interface") implemented fiemap_fill_next_extent() using copy_to_user()
> > where direct mm->mmap_lock dependency is inevitable.
> > 
> > Since ntfs3 does not want to release ni->ni_lock and/or ni->file.run_lock
> > in order to make sure that "struct ATTRIB" does not change during
> > ioctl(FS_IOC_FIEMAP) request, let's make it possible to call
> > fiemap_fill_next_extent() with filesystem locks held.
> > 
> > This patch adds fiemap_fill_next_kernel_extent() which spools
> > "struct fiemap_extent" to dynamically allocated kernel buffer, and
> > fiemap_copy_kernel_extent() which copies spooled "struct fiemap_extent"
> > to userspace buffer after filesystem locks are released.
> 
> Ugh...  I'm pretty certain that this is a wrong approach.  What is going
> on in ntfs_file_mmap() that requires that kind of locking?
> 
> AFAICS, that's the part that looks odd...  Details, please.

                if (ni->i_valid < to) {
                        if (!inode_trylock(inode)) {
                                err = -EAGAIN;
                                goto out;
                        }
                        err = ntfs_extend_initialized_size(file, ni,
                                                           ni->i_valid, to);
                        inode_unlock(inode);
                        if (err)
                                goto out;
                }

See that inode_trylock() there?  That's another sign of the same
problem; it's just that their internal locks (taken by
ntfs_extend_initialized_size()) are taken without the same
kind of papering over the problem.

'to' here is guaranteed to be under the i_size_read(inode); is
that some kind of delayed allocation?  Or lazy extending
truncate(), perhaps?  I'm not familiar with ntfs codebase (or
ntfs layout, for that matter), so could somebody describe what
exactly is going on in that code?

Frankly, my impression is that this stuff is done in the wrong
place...