Thanks a lot for all your help, Ted. Appreciate if you could prioritize the fix. On Tue, Mar 29, 2022 at 6:38 PM Theodore Ts'o <tytso@xxxxxxx> wrote: > > (Removing linux-fsdevel from the cc list since this is an ext4 > specific issue.) > > On Mon, Mar 28, 2022 at 09:38:18PM +0530, Fariya F wrote: > > Hi Ted, > > > > Thanks for the response. Really appreciate it. Some questions: > > > > a) This issue is observed on one of the customer board and hence a fix > > is a must for us or at least I will need to do a work-around so other > > customer boards do not face this issue. As I mentioned my script > > relies on df -h output of used percentage. In the case of the board > > reporting 16Z of used space and size, the available space is somehow > > reported correctly. Should my script rely on available space and not > > on the used space% output of df. Will that be a reliable work-around? > > Do you see any issue in using the partition from then or some where > > down the line the overhead blocks number would create a problem and my > > partition would end up misbehaving or any sort of data loss could > > occur? Data loss would be a concern for us. Please guide. > > I'm guessing that the problem was caused by a bit-flip in the > superblock, so it was just a matter of hardware error. What version > of e2fsprogs are using, and did you have metadata checksum (meta_csum) > feature enabled? Depending on where the bit-flip happened --- e.g., > whether it was in memory and then superblock was written out, or on > the eMMC or other storage device --- if the metadata checksum feature > caught the superblock error, it would have detected the issue, and > while it would have required a manual fsck to fix it, at that point it > would have fallen back to use the backup superblock version. > > > b) Any other suggestions of a work-around so even if the overhead > > blocks reports more blocks than actual blocks on the partition, i am > > able to use the partition reliably or do you think it would be a > > better suggestion to wait for the fix in e2fsprogs? > > > > I think apart from the fix in e2fsprogs tool, a kernel fix is also > > required, wherein it performs check that the overhead blocks should > > not be greater than the actual blocks on the partition. > > Yes, we can certainly have the kernel check to see if the overhead > value is completely insane, and if so, recalculate it (even though it > would slow down the mount). > > Another thing we could do is to always recaluclate the overhead amount > if the file system is smaller than some arbitrary size, on the theory > that (a) for small file systems, the increased time to mount the file > system will not be noticeable, and (b) embedded and mobile devices are > often where "cost optimized" (my polite way of saying crappy quality > to save a pentty or two in Bill of Materials costs) are most likely, > and so those are where bit flips are more likely. > > Cheers, > > - Ted