On Thu, Apr 9, 2020 at 3:04 PM Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> wrote: > > > > On 09/04/2020 14.48, Amir Goldstein wrote: > > On Thu, Apr 9, 2020 at 2:28 PM Konstantin Khlebnikov > > <khlebnikov@xxxxxxxxxxxxxx> wrote: > >> > >> On 09/04/2020 13.23, Amir Goldstein wrote: > >>> On Thu, Apr 9, 2020 at 11:30 AM Konstantin Khlebnikov > >>> <khlebnikov@xxxxxxxxxxxxxx> wrote: > >>>> > >>>> Stacked filesystems like overlayfs has no own writeback, but they have to > >>>> forward syncfs() requests to backend for keeping data integrity. > >>>> > >>>> During global sync() each overlayfs instance calls method ->sync_fs() > >>>> for backend although it itself is in global list of superblocks too. > >>>> As a result one syscall sync() could write one superblock several times > >>>> and send multiple disk barriers. > >>>> > >>>> This patch adds flag SB_I_SKIP_SYNC into sb->sb_iflags to avoid that. > >>>> > >>>> Reported-by: Dmitry Monakhov <dmtrmonakhov@xxxxxxxxxxxxxx> > >>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> > >>>> --- > >>> > >>> Seems reasonable. > >>> You may add: > >>> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx> > >>> > >>> +CC: containers list > >> > >> Thanks > >> > >>> > >>> This bring up old memories. > >>> I posted this way back to fix handling of emergency_remount() in the > >>> presence of loop mounted fs: > >>> https://lore.kernel.org/linux-ext4/CAA2m6vfatWKS1CQFpaRbii2AXiZFvQUjVvYhGxWTSpz+2rxDyg@xxxxxxxxxxxxxx/ > >>> > >>> But seems to me that emergency_sync() and sync(2) are equally broken > >>> for this use case. > >>> > >>> I wonder if anyone cares enough about resilience of loop mounted fs to try > >>> and change the iterate_* functions to iterate supers/bdevs in reverse order... > >> > >> Now I see reason behind "sync; sync; sync; reboot" =) > >> > >> Order old -> new allows to not miss new items if list modifies. > >> Might be important for some users. > >> > > > > That's not the reason I suggested reverse order. > > The reason is that with loop mounted fs, the correct order of flushing is: > > 1. sync loop mounted fs inodes => writes to loop image file > > 2. sync loop mounted fs sb => fsyncs the loop image file > > 3. sync the loop image host fs sb > > > > With forward sb iteration order, #3 happens before #1, so the > > loop mounted fs changes are not really being made durable by > > a single sync(2) call. > > If fs in loop mounted with barriers then sync_fs will issue > REQ_OP_FLUSH to loop device and trigger fsync() for image file. > Sync() might write something twice but data should be safe. > Without barriers this scenario is broken for sure. > > Emergency remount R/O is other thing. It really needs reverse order. > Correct. There is no problem with durability. Although for some filesystems it would be more efficient to first write and fsync the loop images and then sync_fs(). I can potentially result in less overall disk barriers. Thanks, Amir.