Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 19 Oct 2018 at 16:45, Richard Weinberger <richard@xxxxxx> wrote:
> ----- Ursprüngliche Mail -----
> > Von: "Rafał Miłecki" <zajec5@xxxxxxxxx>
> > An: "Amir Goldstein" <amir73il@xxxxxxxxx>, "Miklos Szeredi" <miklos@xxxxxxxxxx>, linux-unionfs@xxxxxxxxxxxxxxx,
> > linux-fsdevel@xxxxxxxxxxxxxxx, "richard" <richard@xxxxxx>, "Artem Bityutskiy" <dedekind1@xxxxxxxxx>, "Adrian Hunter"
> > <adrian.hunter@xxxxxxxxx>, linux-mtd@xxxxxxxxxxxxxxxxxxx, "Russell Senior" <russell@xxxxxxxxxxxxxxxxx>, "OpenWrt
> > Development List" <openwrt-devel@xxxxxxxxxxxxxxxxx>
> > Gesendet: Freitag, 19. Oktober 2018 14:31:29
> > Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
>
> > Hi,
> >
> > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > & describe test case for reproducing corruption when doing a power cut
> > after first boot.
> >
> > Interestingly it cannot be reproduced on all devices (NAND dependant?
> > arch dependant?!). I couldn't reproduce that problem on none of my
> > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> > lower inode on copy up").
> >
> > FWIW I was told it also affects:
> > Asus RT-AC58U (ARCH_IPQ40XX=y)
> > powerpc
> > RB493G, DIR-860L (ATH79=y)
> >
> > Steps to reproduce the problem:
> > 1) Flash firmware
> > 2) Boot (for the first time)
> > 3) Let the init script copy config files from lowerdir to the upperdir
> > 4) Wait for boot to finish
> > 5) Verify content of some unmodified config on overlay, using either:
> > hexdump -C /etc/config/dropbear
> > hexdump -C /overlay/upper/etc/config/dropbear
> > 6) Power cut & boot again
> > 7) Check the content of the same file
>
> Do you have something also I can test?
> A C reproducer? An xfstest case?

I don't. I may try writing one with info provided my Amir, but I'm not
experienced with such things, won't be trivial for me.


> > After above regressing commit the later check confirms the file size
> > looks correct but it's filled with all 00-es only.
> >
> > Can I ask you to check if there is something possibly wrong with the
> > above ovl commit? Or does it expose some problem with the ubifs? Or
> > maybe the whole UBI?
>
> Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> Did you bisect the problem and you are sure that the said commit is the first bad commit?

Yes, I did git bisect and then double verified that.


> > FWIW testing above commit (and one before it) always results in single
> > error in the kernel log:
> > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
>
> Please show the full log.
> The orphan thing rings a bell, we had such a bug already.

I will get a full log later. Please note I wrote this error appears
*with* ovl commit and also with one commit earlier. So it's very
unlikely to be caused by ovl change. Most likely it was some error
present in 4.11.0-rc1 and fixed later (not related to ovl).

-- 
Rafał




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux