Re: Switching to iterate_shared

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Normally readdir iteration is very similar to lookup().
>
> The *one* exception tends to be directory entry caches, although
> sometimes those too are shared with lookup.
>
[...]
>
> > Hmm, I wonder. Can gentents() proceed without f_pos_lock on
> > single f_count and then a cloned fd created and another gentents()
> > races with the first one?
>
> That would break much more important system calls than getdents (ie
> basic read/write).
>
> So if fdget_pos() were to be broken, we have much bigger issues.
>
> In case  you care, the thing that protects us against another thread
> using the same file descriptor is
>
>   fdget_pos ->
>     __fdget_pos ->
>       __fdget ->
>         __fdget_light ->
>           check current->files->count
>           do full __fdget() if it's > 1
>
> and then __fdget_pos() does that
>
>                 if (file_count(file) > 1) {
>                         v |= FDPUT_POS_UNLOCK;
>                         mutex_lock(&file->f_pos_lock);
>                 }
>
> check afterwards.
>
> IOW, this code is carefully written to avoid both the file pointer
> reference increment *and* the f_pos_lock in the common case of a
> single-threaded process that only has the file open once.
>
> But if 'current->files->count' is elevated - iow some other thread has
> access to the file array and might be open/close/dup'ing files, we
> take the file reference.
>
> And then if the file is open through multiple file descriptors (which
> might be in another file table entirely), _or_ if we just took the
> file refernce due to that threading issue - then in either of those
> cases we will also now take the file pos lock.
>
> End result: either this system call is the only one that can access
> that file, or we have taken the file pos lock.
>
> Either way, getdents() ends up being serialized wrt that particular open file.
>

Yes, I see it now.

> Now, you can actually screw this up if you really try: the file pos
> lock obviously also is conditional on FMODE_ATOMIC_POS.
>
> So if you have a filesystem that does "stream_open()" to clear
> FMODE_ATOMIC_POS, that can avoid the file pos lock entirely.
>
> So don't do that. I'm looking at fuse and FOPEN_STREAM, and that looks
> a bit iffy.
>
> (Of course, for a virtual filesystem, if you want to have a getdents()
> that doesn't allow lseek, and that doesn't use f_pos for getdents but
> just some internal other logic, clearing FMODE_ATOMIC_POS might
> actually be entirely intentional. It's not like the file pos lock is
> really _required_).
>
> > I tried to look at ovl_iterate(). The subtlety is regarding the internal
> > readdir cache.
>
> Ok, from a quick look, you do have
>
>         cache = ovl_dir_cache(d_inode(dentry));
>
> so yeah, the cache is per-inode and thus different 'struct file'
> entries for the same directory will need locking.
>

The cache in ovl_cache_get() is a refcounted object, so I think
we can fix grabbing the instance and initiating a new instance safely.
The cache in ovl_cache_get_impure() is more problematic.
I'll see what we can do about it.

> > Compared to fuse_readdir(), which is already "iterate_shared"
> > and also implements internal readdir cache, fuse adds an internal
> > mutex (quote) "against (crazy) parallel readdir on same open file".
>
> Maybe the FUSE lock is *because* fuse allows that FOPEN_STREAM?
>
> > Considering this protection is already provided by f_pos_lock
> > 99% of the time,
>
> It's not 99% of the time.
>
> f_pos_lock is reliable. You have to explicitly turn it off. FUSE is
> either confused and does pointless locking, or has done that
> stream_open() thing to explicitly turn off the vfs locking.
>
> Or, probably more realistically, the fuse code goes back to before we
> did f_pos_lock.

I'll leave that riddle for Miklos to solve.

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux