On Mon, Nov 16, 2020 at 6:36 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > On Mon, Nov 16, 2020 at 05:20:04PM +0200, Amir Goldstein wrote: > > On Mon, Nov 16, 2020 at 4:42 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > > > On Sun, Nov 15, 2020 at 08:57:58PM -0800, Sargun Dhillon wrote: > > > > Overlayfs added the ability to setup mounts where all syncs could be > > > > short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile"). > > > > > > > > A user might want to remount this fs, but we do not let the user because > > > > of the "incompat" detection feature. In the case of volatile, it is safe > > > > to do something like[1]: > > > > > > > > $ sync -f /root/upperdir > > > > $ rm -rf /root/workdir/incompat/volatile > > > > > > > > There are two ways to go about this. You can call sync on the underlying > > > > filesystem, check the error code, and delete the dirty file if everything > > > > is clean. If you're running lots of containers on the same filesystem, or > > > > you want to avoid all unnecessary I/O, this may be suboptimal. > > > > > > > > > > Hi Sargun, > > > > > > I had asked bunch of questions in previous mail thread to be more > > > clear on your requirements but never got any response. It would > > > have helped understanding your requirements better. > > > > > > How about following patch set which seems to sync only dirty inodes of > > > upper belonging to a particular overlayfs instance. > > > > > > https://lore.kernel.org/linux-unionfs/20201113065555.147276-1-cgxu519@xxxxxxxxxxxx/ > > > > > > So if could implement a mount option which ignores fsync but upon > > > syncfs, only syncs dirty inodes of that overlayfs instance, it will > > > make sure we are not syncing whole of the upper fs. And we could > > > do this syncing on unmount of overlayfs and remove dirty file upon > > > successful sync. > > > > > > Looks like this will be much simpler method and should be able to > > > meet your requirements (As long as you are fine with syncing dirty > > > upper inodes of this overlay instance on unmount). > > > > > > > Do note that the latest patch set by Chengguang not only syncs dirty > > inodes of this overlay instance, but also waits for in-flight writeback on > > all the upper fs inodes and I think that with !ovl_should_sync(ofs) > > we will not re-dirty the ovl inodes and lose track of the list of dirty > > inodes - maybe that can be fixed. > > > > Also, I am not sure anymore that we can safely remove the dirty file after > > sync dirty inodes sync_fs and umount. If someone did sync_fs before us > > and consumed the error, we may have a copied up file in upper whose > > data is not on disk, but when we sync_fs on unmount we won't get an > > error? not sure. > > May be we can save errseq_t when mounting overlay and compare with > errseq_t stored in upper sb after unmount. That will tell us whether > error has happened since we mounted overlay. (Similar to what Sargun > is doing). > I suppose so. > In fact, if this is a concern, we have this issue with user space > "sync <upper>" too? Other sync might fail and this one succeeds > and we will think upper is just fine. May be container tools can > keep a file/dir open at the time of mount and call syncfs using > that fd instead. (And that should catch errors since that fd > was opened, I am assuming). > Did not understand the problem with userspace sync. > > > > I am less concerned about ways to allow re-mount of volatile > > overlayfs than I am about turning volatile overlayfs into non-volatile. > > If we are not interested in converting volatile containers into > non-volatile, then whole point of these patch series is to detect > if any writeback error has happened or not. If writeback error has > happened, then we detect that at remount and possibly throw away > container. > > What happens today if writeback error has happened. Is that page thrown > away from page cache and read back from disk? IOW, will user lose > the data it had written in page cache because writeback failed. I am > assuming we can't keep the dirty page around for very long otherwise > it has potential to fill up all the available ram with dirty pages which > can't be written back. > Right. the resulting data is undefined after error. > Why is it important to detect writeback error only during remount. What > happens if container overlay instance is already mounted and writeback > error happens. We will not detct that, right? > > IOW, if capturing writeback error is important for volatile containers, > then capturing it only during remount time is not enough. Normally > fsync/syncfs should catch it and now we have skipped those, so in > the process we lost mechanism to detect writeback errrors for > volatile containers? > Yes, you are right. It's an issue with volatile that we should probably document. I think upper files data can "evaporate" even as the overlay is still mounted. Thanks, Amir.