Re: Corruped NAND booting for all devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Steve,

On 2/5/20, Steve deRosier <derosier@xxxxxxxxx> wrote:
> I've been following your questions on both this list and the
> linux-wireless one. May I recommend some reading:
> http://www.linux-mtd.infradead.org/doc/nand.html
>
> It isn't clear what filesystem you're using, though I recall from an
> earlier email you weren't running UBIFS. But in the log I do see UBIFS
> messages. In any case, based on your descriptions, I strongly suspect
> NAND bitflips are causing your filesystem corruptions, and you likely
> don't have the correct settings for the ECC strength as necessary for
> your NAND. Or maybe you're not flashing images correctly and the ECC
> info is getting lost.  Or maybe you're writing logs and such to flash
> and you're filing up the filesystem. Maybe your extents aren't correct
> and one filesystem overwrites another. Unfortunately, you've got your
> system so cobbled up with user-space prettiness in your log output
> that you're obscuring the kernel log output that would help you
> diagnose the problems.

Yes, the file system is UBIFS, the different revision of test units
have been running for many months, they were relative stable until now
for a new revision of hardware. Like you found, we have lots of
problems in low level when running the new revision of hardware. As
both firmware and hardware evolved, the first rational thing is to
narrow down the source of the problem.

I appreciate all your advice which are very helpful and valid, the
hardware was designed by other contractors, there is limited tools and
equipment for software guy to debug the hardware. Hardware contractors
firmly ruled out any issues in hardware, they pointed finger to
software image built from Yocto to cause the NAND corruption. The
Yocto image contains all open sources, Linux kernel, connman, MTD,
ofono, etc, so I try to figure out if there are limitations and
constrains to turn the device power off while it may be in the middle
of erasing pages, would that cause the NAND flash corrupted? Or we
might not set up things properly?

I posted message here to gather information from your experiences and
to take your advice to figure out in what circumstances that the NAND
corruption could be occurred. So we could  mitigate the issues as much
as possible.

As you said, there are so many things in software and hardware could
cause the NAND corruption, what I am particular interested in is if so
called a bad Yocto image could cause the NAND corruption, let's make
it clear I am not talking about software problems in that image, I am
talking about Yocto build system problem which generated a bad image.
I thought, if you built a bad image, it would not be able to run at
first time, if an image to run NAND booting well for several days,
what that the Yocto build system could to make the image corrupted the
NAND  late like a virus? It does not make sense to me, but I could be
wrong.

Thank you.

Kind regards,

- jh

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux