On Tue, Jul 14, 2020 at 4:41 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > On Tue, Jul 14, 2020 at 06:28:41AM +0300, Amir Goldstein wrote: > > On Mon, Jul 13, 2020 at 10:25 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > > > On Mon, Jul 13, 2020 at 01:57:31PM +0300, Amir Goldstein wrote: > > > > Changes to underlying layers while overlay in mounted result in > > > > undefined behavior. Therefore, we can change the behavior to > > > > invalidate the overlay dentry on dcache lookup if one of the > > > > underlying dentries was deleted since the dentry was composed. > > > > > > > > Negative underlying dentries are not expected in overlay upper and > > > > lower dentries. If they are found it is probably dcache lookup racing > > > > with an overlay unlink, before d_drop() was called on the overlay dentry. > > > > IS_DEADDIR directories may be caused by underlying rmdir, so invalidate > > > > overlay dentry on dcache lookup if we find those. > > > > > > Can you elaborate a bit more on this race. Doesn't inode_lock_nested(dir) > > > protect against that. I see that both vfs_rmdir() and vfs_unlink() > > > happen with parent directory inode mutex held exclusively. And IIUC, > > > that should mean no further lookup()/->revalidate() must be in progress > > > on that dentry? I might very well be wrong, hence asking for more > > > details. > > > > > > > lookup_fast() looks in dcache without dir inode lock. > > d_revalidate() is called to check if the found cached dentry is valid. > > Got it. > > > > > For example, ovl_remove_upper() can make an upper dentry negative > > or upper dir inode S_DEAD (i.e. vfs_rmdir) just before calling d_drop() > > to prevent overlay dentry from being found in fast cache lookup. > > > > Unless I am missing something, that leaves a small window where > > lookup_fast() can return an overlay dentry with negative/S_DEAD > > upper dentry, which was not caused by illegitimate underlying fs > > changes, so we must gracefully invalidate the dcache lookup > > (return 0 from revalidate) in order to fallback to fs lookup. > > So what's the side affect of this? I mean one even you make this change, > it is possible that on a cpu parallel unlink is going on and right > after d_revalidate() finishes, upper is marked negative (or directory > S_DEAD). No parallelism required. The side effect is to reduce the attack/fuzzing vector for creating bad things by deleting/renaming lower dentries. > > So this change will not plug the hole. It will just narrow it a bit? Yes, but it has nothing to do with races it has only to do with use cases (see blow). > > /me is failing to see complete picture that what's the problem at > macro level and how this patch fixes it. > Today, if a user deletes/renames underlying lower that leaves the overlayfs dentry in a vulnerable state. I do not have a reproducer to OOPS the kernel with that, but syzbot has created some crashes of similar nature in the past. The fact is that all we can say about this scenario is: "If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock." and that second part cannot be proven to be true. With the set of these two patches, a whole class of attacks is pruned, because every attack that needs to get the vulnerable denry via lookup will not get it. Attacks that use a fd to access a vulnerable dentry may still work. The confusing part about racing with ovl_unlink()/ovl_rmdir() is not really important. It explains that an overlay dentry in the middle of ovl_rmdir() (after vfs_rmdir() and before d_drop()) can be found by parallel dcache lookup and "appear" like an underlying change (upper was removed under the overlay). I only mentioned that because we must not return -ESTALE in that case, but if we remove the -ESTALE conversion, nothing bad will happen. dcache lookup will get 0 from ovl_revalidate on that dentry and re-do the lookup, which is exactly what would have happened if lookup took place a few ms later after ovl_rmdir() called d_drop(). Thanks, Amir.