On 5/12/20 4:08 PM, Julio Lajara wrote: > Hi all, I currently manage an IOT fleet based on Intel NUCs running > Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is > more CPU bound than I/O bound and we are having some issues keeping a > subset of devices running due to them being hard powered off in the > field in some regions (sometimes as frequently as every 12hrs). Due to > current difficulties in getting devices back from the field I'm > looking into tweaking them as best as possible to survive these hard > power off barring any physical SSD issues. I don't think you've actually said what the failure mode after power loss is, have you? > Currently I have tried tweaking some ext4 and I/O settings with the following: > > * kernel options: > elevator=noop fsck.mode=force fsck.repair=yes > > * fstab ext4 specific mount options: > commit=1,max_batch_time=0 > > Are there any other configuration settings or changes to the above > that would make sense to try here for this use case? I am hoping to at > least make the fsck repair the last line of defence so it doesnt get > stuck waiting for a prompt to repair it at boot, but want to try to > change the I/O / ext4 behavior if possible so its writing as > frequently as sanely possible to try to reduce the frequency where > fsck is actually needed. I can't tell from this why fsck is needed in the first place; what actually goes wrong when power is lost? Ted's right that properly behaving hardware should not require any special attention after power loss to restore filesystem consistency, but I can't tell for sure what your actual root cause for boot failure is from this email... -Eric