Re: [PATCH] ovl: skip overlayfs superblocks at global sync

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 9 Apr 2020 16:22:38 +0300

On Thu, Apr 9, 2020 at 3:04 PM Konstantin Khlebnikov
<khlebnikov@xxxxxxxxxxxxxx> wrote:
>
>
>
> On 09/04/2020 14.48, Amir Goldstein wrote:
> > On Thu, Apr 9, 2020 at 2:28 PM Konstantin Khlebnikov
> > <khlebnikov@xxxxxxxxxxxxxx> wrote:
> >>
> >> On 09/04/2020 13.23, Amir Goldstein wrote:
> >>> On Thu, Apr 9, 2020 at 11:30 AM Konstantin Khlebnikov
> >>> <khlebnikov@xxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> Stacked filesystems like overlayfs has no own writeback, but they have to
> >>>> forward syncfs() requests to backend for keeping data integrity.
> >>>>
> >>>> During global sync() each overlayfs instance calls method ->sync_fs()
> >>>> for backend although it itself is in global list of superblocks too.
> >>>> As a result one syscall sync() could write one superblock several times
> >>>> and send multiple disk barriers.
> >>>>
> >>>> This patch adds flag SB_I_SKIP_SYNC into sb->sb_iflags to avoid that.
> >>>>
> >>>> Reported-by: Dmitry Monakhov <dmtrmonakhov@xxxxxxxxxxxxxx>
> >>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> >>>> ---
> >>>
> >>> Seems reasonable.
> >>> You may add:
> >>> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx>
> >>>
> >>> +CC: containers list
> >>
> >> Thanks
> >>
> >>>
> >>> This bring up old memories.
> >>> I posted this way back to fix handling of emergency_remount() in the
> >>> presence of loop mounted fs:
> >>> https://lore.kernel.org/linux-ext4/CAA2m6vfatWKS1CQFpaRbii2AXiZFvQUjVvYhGxWTSpz+2rxDyg@xxxxxxxxxxxxxx/
> >>>
> >>> But seems to me that emergency_sync() and sync(2) are equally broken
> >>> for this use case.
> >>>
> >>> I wonder if anyone cares enough about resilience of loop mounted fs to try
> >>> and change the iterate_* functions to iterate supers/bdevs in reverse order...
> >>
> >> Now I see reason behind "sync; sync; sync; reboot" =)
> >>
> >> Order old -> new allows to not miss new items if list modifies.
> >> Might be important for some users.
> >>
> >
> > That's not the reason I suggested reverse order.
> > The reason is that with loop mounted fs, the correct order of flushing is:
> > 1. sync loop mounted fs inodes => writes to loop image file
> > 2. sync loop mounted fs sb => fsyncs the loop image file
> > 3. sync the loop image host fs sb
> >
> > With forward sb iteration order, #3 happens before #1, so the
> > loop mounted fs changes are not really being made durable by
> > a single sync(2) call.
>
> If fs in loop mounted with barriers then sync_fs will issue
> REQ_OP_FLUSH to loop device and trigger fsync() for image file.
> Sync() might write something twice but data should be safe.
> Without barriers this scenario is broken for sure.
>
> Emergency remount R/O is other thing. It really needs reverse order.
>

Correct. There is no problem with durability.
Although for some filesystems it would be more efficient to first
write and fsync the loop images and then sync_fs().
I can potentially result in less overall disk barriers.

Thanks,
Amir.