On Fri, Nov 15, 2019 at 05:54:23PM +0000, Al Viro wrote: > Anyway, AFAICS removal could be done this way: > > // parent is held exclusive, after is NULL or a child of parent > find_next_child(parent, prev) > child = NULL > node = prev ? &prev->d_child : &parent->d_subdirs; > grab parent->d_lock > for each entry in the list starting at node->next > d = container_of(entry, struct dentry, d_child) > grab d->d_lock > if simple_positive(d) > bump d->d_count > child = d > drop d->d_lock > if child > break > drop parent->d_lock > dput(prev); > return child > > kill_it(victim, parent) > if simple_positive(victim) > d_invalidate(victim); // needed to avoid lost mounts > if victim is a directory > fsnotify_rmdir > else > fsnotify_unlink > if victim is regular > __debugfs_file_removed(victim) > dput(victim) // unpin it > > recursive_removal(dentry) > this = dentry; > while (true) { > victim = NULL; > inode = this->d_inode; > inode_lock(inode); > if (d_is_dir(this)) > mark this doomed > while ((child = find_next_child(this, victim)) == NULL) { > // no children (left); kill and ascend > // update metadata while it's still locked > inode->i_ctime = current_time(inode); > clear_nlink(inode); > inode_unlock(inode); > victim = this; > this = this->d_parent; > inode = this->d_inode; > inode_lock(inode); > kill_it(victim, this); > if (victim == dentry) { > inode->i_ctime = inode->i_mtime = current_time(inode); > if (d_is_dir(dentry)) > drop_nlink(inode); > inode_unlock(inode); > dput(dentry); > return; > } > } > inode_unlock(inode); > this = child; > } Come to think of that, if we use IS_DEADDIR as "no more additions" marking, that looks like a good candidate for all in-kernel rm -rf on ramfs-style filesystems without cross-directory renames. This bit in kill_it() above if victim is regular __debugfs_file_removed(victim) would be an fs-specific callback passed by the caller, turning the whole thing into this: void simple_recursive_removal(struct dentry *dentry, void (*callback)(struct dentry *)) { struct dentry *this = dentry; while (true) { struct dentry *victim = NULL, *child; struct inode *inode = this->d_inode; inode_lock(inode); if (d_is_dir(this)) inode->i_flags |= S_DEAD; while ((child = find_next_child(this, victim)) == NULL) { // kill and ascend // update metadata while it's still locked inode->i_ctime = current_time(inode); clear_nlink(inode); inode_unlock(inode); victim = this; this = this->d_parent; inode = this->d_inode; inode_lock(inode); if (simple_positive(victim)) { d_invalidate(victim); // avoid lost mounts if (is_dir(victim)) fsnotify_rmdir(inode, victim); else fsnotify_unlink(inode, victim); if (callback) callback(victim); dput(victim) // unpin it } if (victim == dentry) { inode->i_ctime = inode->i_mtime = current_time(inode); if (d_is_dir(dentry)) drop_nlink(inode); inode_unlock(inode); dput(dentry); return; } } inode_unlock(inode); this = child; } } with find_next_child() easily implemented via scan_positives() already in libfs.c... Objections? The above is obviously completely untested, and I've got nowhere near enough sleep, so there may be any number of brown paperbag bugs in it. Review would be very welcome...