Re: Nandsim, UBIFS and memory concerns

Steve deRosier <derosier@xxxxxxxxx> · Thu, 4 Oct 2018 15:43:07 -0700

Hi Romain,

On Thu, Oct 4, 2018 at 9:53 AM Romain Izard <romain.izard.pro@xxxxxxxxx> wrote:
>
> On a regular but slow basis, I get report of devices based on UBIFS running
> Linux 4.14 where the file system gets corrupted during an update. The update
> process creates new files with temporary names to replace existing files,
> and uses renames to replace these files atomically. What is observed is that
> in some cases, the update log describes all steps for a complete update, and
> yet some files contain the new version while others contain an older
> version. Moreover, it seems that some files with temporary names that should
> have been renamed are visible.
>
> As the update process is also able to use tmpfs to create files, and will
> use a large part of the available memory, I fear that this issue is related
> with the behaviour of UBIFS in low memory conditions. I'm wondering about
> UBIFS losing some parts of the log when a ENOMEM condition occurs during its
> operations or when the OOM killer targets a process that is doing some UBIFS
> processing.
>

I've seen these sort of symptoms that you describe in the wild. But
what I have seen has never had anything to do with UBIFS, but only
with problems with how updates (or other large filesystems operations)
are implemented. Specifically, the lack of a filesystem sync before a
reboot will have these exact effects. What you end up with is a
situation where the filesystem operations are done, yet the changes
haven't actually been flushed to "disk".  Doesn't mater if it's a HDD
or a UBIFS on flash, the effect is the same, though the time of
vulnerability might be different.

Especially since you mention the OOM killer and using tmpfs - I'd look
into if you're running out of RAM, and either causing an reboot oops
or at least killing the process before all file operations are
complete. Just because your log shows the operation was triggered at
the userspace level, doesn't mean the kernel has completed all
filesystem operations and written the physical device.

What you describe is not an UBIFS corruption, but a garden-variety
user-space file operations corruption issue.

As I said, I've encountered this before. The only thing you can do is
to examine your process and tailor it to be sure to complete it's
physical writes.  In our case, we had a few things to solve: * put
'sync' calls in our update scripts, * avoid the use of a problematic
utility, and * we tried using the `-osync` flag.  (-osync fixed the
problem at the cost of a performance hit. Later we decided not to go
that way and instead instructed our customers how to properly write
programs that wrote the filesystem).

- Steve

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/