> Normally readdir iteration is very similar to lookup(). > > The *one* exception tends to be directory entry caches, although > sometimes those too are shared with lookup. > [...] > > > Hmm, I wonder. Can gentents() proceed without f_pos_lock on > > single f_count and then a cloned fd created and another gentents() > > races with the first one? > > That would break much more important system calls than getdents (ie > basic read/write). > > So if fdget_pos() were to be broken, we have much bigger issues. > > In case you care, the thing that protects us against another thread > using the same file descriptor is > > fdget_pos -> > __fdget_pos -> > __fdget -> > __fdget_light -> > check current->files->count > do full __fdget() if it's > 1 > > and then __fdget_pos() does that > > if (file_count(file) > 1) { > v |= FDPUT_POS_UNLOCK; > mutex_lock(&file->f_pos_lock); > } > > check afterwards. > > IOW, this code is carefully written to avoid both the file pointer > reference increment *and* the f_pos_lock in the common case of a > single-threaded process that only has the file open once. > > But if 'current->files->count' is elevated - iow some other thread has > access to the file array and might be open/close/dup'ing files, we > take the file reference. > > And then if the file is open through multiple file descriptors (which > might be in another file table entirely), _or_ if we just took the > file refernce due to that threading issue - then in either of those > cases we will also now take the file pos lock. > > End result: either this system call is the only one that can access > that file, or we have taken the file pos lock. > > Either way, getdents() ends up being serialized wrt that particular open file. > Yes, I see it now. > Now, you can actually screw this up if you really try: the file pos > lock obviously also is conditional on FMODE_ATOMIC_POS. > > So if you have a filesystem that does "stream_open()" to clear > FMODE_ATOMIC_POS, that can avoid the file pos lock entirely. > > So don't do that. I'm looking at fuse and FOPEN_STREAM, and that looks > a bit iffy. > > (Of course, for a virtual filesystem, if you want to have a getdents() > that doesn't allow lseek, and that doesn't use f_pos for getdents but > just some internal other logic, clearing FMODE_ATOMIC_POS might > actually be entirely intentional. It's not like the file pos lock is > really _required_). > > > I tried to look at ovl_iterate(). The subtlety is regarding the internal > > readdir cache. > > Ok, from a quick look, you do have > > cache = ovl_dir_cache(d_inode(dentry)); > > so yeah, the cache is per-inode and thus different 'struct file' > entries for the same directory will need locking. > The cache in ovl_cache_get() is a refcounted object, so I think we can fix grabbing the instance and initiating a new instance safely. The cache in ovl_cache_get_impure() is more problematic. I'll see what we can do about it. > > Compared to fuse_readdir(), which is already "iterate_shared" > > and also implements internal readdir cache, fuse adds an internal > > mutex (quote) "against (crazy) parallel readdir on same open file". > > Maybe the FUSE lock is *because* fuse allows that FOPEN_STREAM? > > > Considering this protection is already provided by f_pos_lock > > 99% of the time, > > It's not 99% of the time. > > f_pos_lock is reliable. You have to explicitly turn it off. FUSE is > either confused and does pointless locking, or has done that > stream_open() thing to explicitly turn off the vfs locking. > > Or, probably more realistically, the fuse code goes back to before we > did f_pos_lock. I'll leave that riddle for Miklos to solve. Thanks, Amir.