Re: Reducing ext4 fs issues resulting from frequent hard poweroffs

Eric Sandeen <sandeen@xxxxxxxxxxx> · Tue, 12 May 2020 22:16:00 -0500

On 5/12/20 4:08 PM, Julio Lajara wrote:
> Hi all, I currently manage an IOT fleet based on Intel NUCs running
> Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is
> more CPU bound than I/O bound and we are having some issues keeping a
> subset of devices running due to them being hard powered off in the
> field in some regions (sometimes as frequently as every 12hrs). Due to
> current difficulties in getting devices back from the field I'm
> looking into tweaking them as best as possible to survive these hard
> power off barring any physical SSD issues.

I don't think you've actually said what the failure mode after power
loss is, have you?

> Currently I have tried tweaking some ext4 and I/O settings with the following:
> 
> * kernel options:
>   elevator=noop fsck.mode=force fsck.repair=yes
> 
> * fstab ext4 specific mount options:
>   commit=1,max_batch_time=0
> 
> Are there any other configuration settings or changes to the above
> that would make sense to try here for this use case? I am hoping to at
> least make the fsck repair the last line of defence so it doesnt get
> stuck waiting for a prompt to repair it at boot, but want to try to
> change the I/O / ext4 behavior if possible so its writing as
> frequently as sanely possible to try to reduce the frequency where
> fsck is actually needed.

I can't tell from this why fsck is needed in the first place; what
actually goes wrong when power is lost?  Ted's right that properly
behaving hardware should not require any special attention after
power loss to restore filesystem consistency, but I can't tell for
sure what your actual root cause for boot failure is from this
email...

-Eric