On Wed, Aug 21, 2019 at 08:13:37AM +1000, Dave Chinner wrote: > On Tue, Aug 20, 2019 at 08:23:00AM -0400, Brian Foster wrote: > > On Tue, Aug 20, 2019 at 09:23:04PM +1000, Dave Chinner wrote: > > > On Tue, Aug 20, 2019 at 06:51:01AM -0400, Brian Foster wrote: > > > > On Tue, Aug 20, 2019 at 04:53:22PM +0800, kaixuxia wrote: > > > > FWIW if we do take that approach, then IMO it's worth reconsidering the > > > > 1-2 liner I originally proposed to fix the locking. It's slightly hacky, > > > > but really all three options are hacky in slightly different ways. The > > > > flipside is it's trivial to implement, review and backport and now would > > > > be removed shortly thereafter when we replace the on-disk whiteout with > > > > the in-core fake whiteout thing. Just my .02 though.. > > > > > > We've got to keep the existing whiteout method around for, > > > essentially, forever, because we have to support kernels that don't > > > do in-memory translations of DT_WHT to a magic chardev inode and > > > vice versa (i.e. via mknod). IOWs, we'll need a feature bit to > > > indicate that we actually have DT_WHT based whiteouts on disk. > > > > > > > I'm not quite following (probably just because I'm not terribly familiar > > with the use case). If current kernels know how to fake up whiteout > > inodes in memory based on a dentry, why do we need to continue to create > > new on-disk whiteout inodes just because a filesystem might already have > > such inodes on disk? > > We don't, unless there's a chance the filesystem will be booted > again on an older kernel. So it's a one-way conversion: once we > start using DT_WHT based whiteouts, there is no going back. > Ok. I think I had the (wrong) impression that all we had to do was create the whiteout dentry and the inode magic would happen elsewhere in overlayfs or something, but in reality overlayfs only cares about the magic inode and DT_WHT basically just saves us from allocating a physical inode. > > Wouldn't the old format whiteouts just continue to > > work as expected without any extra handling? > > Yes, but that's not the problem. It's forwards compatibility that > matters here. i.e. upgrade a kernel, something doesn't work, roll > back to older kernel. Everything should work if the feature set on > the filesystem is unchanged, even though the filesystem was used > with a newer kernel. > > If the newer kernel has done something on disk that the older kernel > does not understand (e.g. using DT_WHT based whiteouts), then the > newer kernel must mark the filesystem with a feature bit to prevent > the older kernel from doing the wrong thing with the format it > doesn't exect to see. Mis-parsing a whiteout and not applying it > correctly is a stale data exposure security bug, so we have to be > careful here. > > Existing kernels don't know how to take a DT_WHT whiteout and fake > up a magical chardev inode. Hence DT_WHT whiteouts will not work > correctly on current kernels, even though we have DT_WHT in the dir > ftype feature (and hence available on both v4 and v5 filesystems). > i.e. the issue here is the VFS has defined the on-disk whiteout > format, and we want the on-disk storage from chardev inodes to > DT_WHT. Hence we need a feature bit because we are changing the > method of storing whiteouts on disk, even though we > -technically- aren't changing the _XFS_ on-disk format. > > Think of it as though DT_WHT support doesn't exist in XFS, and now > we are going to add it... > Yep, makes sense. > > I can see needing a feature bit to restrict a filesystem from being used > > on an unsupported, older kernel, but is there a reason we wouldn't just > > enable that by default anyways? > > Rule #1: Never enable new on-disk feature bits by default > Heh, Ok. That all seems proper enough reason to keep the existing rename whiteout code around for a while. Thanks. Brian > :) > > Yes, in future we can decide to automatically enable the feature bit > in the kernel code, but that's not something we can do until kernel > support for the DT_WHT based whiteouts is already widespread as it's > not a backwards compatible change. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx