Continuation of the topic "error Foo(now -5, once -22) while searching super root"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This letter is a continuation of the thread that I started at 2023_10_15

https://marc.info/?l=linux-nilfs&m=169738371518323&w=2
archival copy: https://archive.is/Fbw5e

The purpose of my current letter is to document, write down,
information that might contain hints to the flaw.

This time the computer was the same,

    (Two 2-threaded cores, 12GiB RAM minus the video memory. A line from /var/cpuinfo)
    Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz

Linux distribution on the computer was different,
a live-DVD with KNOPPIX version 9.1
freshly written on a MDisc DVD(As of 2023_12 still totally buyable
from original manufacturer, ritek-europe dot com, redirects to conrexx dot com,
in small quantities like a pack of 100 discs for about 200€, including all shipping and handling.
They order them from Taiwan to Netherlands and then use FedEx to send from
Netherlands to the rest of the EU. No, I do NOT sell them myself, NOR do I earn from that business in any way,
I just had hard time getting MDisc DVDs and that's where I got those,
but the delivery time was about 2 months.) and the USB-HDD was a
~465GiB < 500GiB sized magnetic disc, which was mounted
with mount options "noatime,nodiratime". The error occurred
during a long "git commit -a". The repository resided
on the USB-HDD. There was plenty of CPU-time free, because
only the window manager with a "few" "standard" KNOPPIX
programs were running and there was no shortage of RAM, because
multiple GiB was free. The "git commit -a" was given over
an SSH session, id est the USB cable stayed put, no movement
of the USB cable due to the use of a laptop keyboard.
The laptop booted from the MDisc DVD about 2 days before
the error occurred and the rest of the programs at the laptop
seem to work fine after the error without rebooting the laptop, id est
the kernel did not totally crash.

The laptop with the ~465GiB USB-HDD was not on the same table with the
keyboard that was in use, id est keyboard vibrations did not
reach the USB-HDD or the laptop in any significant amount.


    ----start--of--citation--of--dmesg--output--last--lines---
    [  150.848200] usb 3-4: new high-speed USB device number 5 using xhci_hcd
    [  150.989381] usb 3-4: New USB device found, idVendor=152d, idProduct=2329, bcdDevice= 1.00
    [  150.989390] usb 3-4: New USB device strings: Mfr=1, Product=2, SerialNumber=5
    [  150.989394] usb 3-4: Product: USB to ATA/ATAPI bridge
    [  150.989397] usb 3-4: Manufacturer: JMicron
    [  150.989401] usb 3-4: SerialNumber: 801130168383
    [  150.990928] usb-storage 3-4:1.0: USB Mass Storage device detected
    [  150.991138] usb-storage 3-4:1.0: Quirks match for vid 152d pid 2329: 8020
    [  150.991193] scsi host7: usb-storage 3-4:1.0
    [  154.102691] scsi 7:0:0:0: Direct-Access     WDC WD50 00LPLX-60ZNTT1   02.0 PQ: 0 ANSI: 2 CCS
    [  154.103140] sd 7:0:0:0: Attached scsi generic sg2 type 0
    [  154.103556] sd 7:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/466 GiB)
    [  154.103931] sd 7:0:0:0: [sdc] Write Protect is off
    [  154.103940] sd 7:0:0:0: [sdc] Mode Sense: 28 00 00 00
    [  154.104321] sd 7:0:0:0: [sdc] No Caching mode page found
    [  154.104328] sd 7:0:0:0: [sdc] Assuming drive cache: write through
    [  154.187547]  sdc: sdc1 sdc2 sdc3 sdc4 sdc5 sdc6 sdc7 sdc8
    [  154.188768] sd 7:0:0:0: [sdc] Attached SCSI disk
    [  222.853951] NILFS version 2 loaded
    [  222.855302] NILFS (sdc7): mounting unchecked fs
    [  226.181683] sd 7:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
    [  226.181687] sd 7:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
    [  226.181689] sd 7:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
    [  226.181692] sd 7:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 28 13 20 28 00 00 08 00
[ 226.181694] blk_update_request: critical medium error, dev sdc, sector 672342056 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  226.181726] NILFS (sdc7): I/O error reading segment
    [  226.181729] NILFS (sdc7): error -5 while searching super root
    root@Microknoppix:/home/knoppix/haakimiskataloogid# mount -t nilfs2 /dev/sdc7 ./h1
    mount.nilfs2: Error while mounting /dev/sdc7 on /home/knoppix/haakimiskataloogid/h1: Input/output error
    root@Microknoppix:/home/knoppix/haakimiskataloogid#
    ----end----of--citation--of--dmesg--output--last--lines---

I find it scary that a file system can get so unusable
during ordinary use while the hardware seems to be just fine
and there is no standard tool to recover even a fraction
of the files at the NilFS2 partition. As of 2023_12_29
I have most of my files on NilFS2 partitions with the hope
that it helps to preserve them, but it turns out that
when ext4 fails, looses files, at power failures, then NilFS2 fails
at plain usage scenarios, where there is no power
failure or any other relevant event. As things stand now (2023_12_29),
my strategy is to mirror my files at different file systems:
NilFS2 to not loose files during power failures or resets and
ext4 to not loose files during plain, calm, low-intensity, HDD usage.

Another line of thought is that RAIDs are useless, if the
kernel of the computer that the RAID is connected to,
corrupts in RAM. Therefore the HDDs with different file system types
should be connected to different computers, preferably running
different operating systems. As my main operating system is some
Linux distribution (varies over time and between machines), then
FreeBSD, OpenBSD and Solaris derivatives (illumos and alike
come to mind) as hopefully sufficiently varying options.

I mentioned MDisc DVD-s, because if an operating system
boots from a DVD, then there CAN NOT BE ANY KERNEL
FILE SYSTEM CORRUPTION RELATED BOOT BINARY CORRUPTION
and unlike plain DVD-s, MDisc DVDs last longer than 10 years.
MDisc DVDs can be reliably written only with a special DVD writer that has
slightly more powerful laser than other, "ordinary", DVD-writers,
but the prices of such USB-DVD-writers are roughly the same as
with other USB-DVD-writers, only the specs differ and one
must make a slightly greater effort to find the USB-DVD-writers
that have "MDisc support". Supposedly other DVD-writers
also write MDisc DVDs, but with a great error rate. MDisc DVDs
are designed to be readable with plain DVD-writers/readers
and even old DVD-readers that lack DVD writing capability.
I mention that aspect, because with Raspberry_Pi-like computers
the Flash memory card wear related errors also appear
at the Linux kernel binary and other installed binaries and
that makes the Raspberry_Pi-like computers unstable over time.
As of 2023 the newest Raspberry_Pi official Linux, the "RaspberryOS"
has the option to use a "readonly-write-only-to-RAM filesystem", a lot like
live Linux DVDs use to reduce the wear of the memory card,
but the various scientific papers (You can search them Yourself,
there are plenty on the net, easy to find, semanticscholar dot org )
basically, depending on how one interprets the graphs and
temperature conditions of memory cards and memory sticks,
state the "sufficiently reliable" data retention rate to
about one year, 2 years tops. Again, depending on interpretation.
My interpretation is that if I touch a memory card or
a USB memory stick than I feel that it's hot and therefore
the Flash memory die in the device must be even hotter.
My personal intuitive observation matches with the
roughly 1 year retention time, after which the data should
be rewritten, including file system formatting information.

That is to say, for reliability, Flash memory card should be taken out of
a Raspberry_Pi-like computer roughly once per year and
the Linux program dd should/might be used to rewrite the
original image to the memory card, even if the memory card
is used in "readonly-write-only-to-RAM mode".

The F-RAM, used for storing program code in
car-industry microcontrollers (MCUs) does not
seem to be any better than Flash, despite initial hype,
except that may be MCUs are "stored"/used in cooler conditions.
I do not know. Car engines do get hot. That is to say,
some old-fashioned ROM can be pretty nice thing to have
and for laptops and desktops a MDisc DVD or BluRay
can be that "ROM", except that according to
some sources on the wild-wild-web the BluRay's,
including the proper MDisc BluRays that have
the inorganic die, not the
Verbatim (yes, that famous brand) fakes that only
use the MDisc as a trade-mark, supposedly have
a higher error rate than MDisc DVDs. But, even
plain DVDs can be a lot of help, for at least 5 year period,
possibly for a 10 year period. Again, if DVD
is like ROM, then any driver binary, including NilFS2 driver,
binary does not corrupt due to storage bitrot,
filesystem information corruption.

And the bad news is that the so hyped up "cloud storage",
where the storage providers advertise that they store
their clients' data at some really fancy and fast
solid state storage devices (essentially Flash memory)
has exactly the same bitrotting issue, which is why
at least one person that I know of (not me, yet)
keeps an off-cloud list of file hashes (MD5, SHA256, ...)
of files that a his clients' web application
at a server consists of. I mean, if banking information
or other critical information is stored at modern Flash-memory
based fast storage devices at the greatest and fanciest
servers and that information were to corrode due to
some file system driver issue that no RAID can compensate...


Summary of my compromise-semi-workaround:

    x) Boot from a live-DVD and use a Bash script to
       customize the running instance, id est copy
       the /etc/passwd and /etc/shadow files and
       /etc/ssh folder. Plain DVDs will do, but
       MDisc DVDs are better and with some long-term planning
       it is still (as of 2023_12) possible to get them, not as "new-old-stock", but
       as brand new products that are still being produced in relatively small volumes.
       Once their patents expire, there might be even multiple MDisc DVD producers,
       if there is enough demand for MDisc DVDs. If people still
       buy plain DVDs, then there might be also some market for MDisc DVDs.

       (I like to think of DVDs like I think of paper: relatively low data capacity,
       we don't produce them at home, id est it takes a factory to
       produce them, yet we use the old-fashioned paper still for
       data storage in many situations, like labels on apple-jam jars,
       packaging of many products contain text and image information, etc.
       In that sense DVD format as such might last for a long time,
       specially if it overcomes storage reliability issues like
       the MDisc DVDs have overcome, and if there are
       multiple producers like there are multiple paper producing
       factories.)

    x) Mirror files on different HDDs/SSDs that have
       different file system types,
       one HDD/SSD per computer to counter a situation, where
       a kernel/file_system_driver running instance corrupts
       due to some typical C/C++ related memory corruption.

    x) With Raspberry_Pi-like computers, overwrite the
       memory card once per year (with "dd", id est
       including filesystem formatting information)
       and try to avoid wearing the memory card by switching off
       the swap ("swapoff --all").

    x) With various Linux file systems use the
       "noatime,nodiratime" mount options
       ("mount -o noatime,nodiratime /dev/foodevice /bar/folder")

Thank You for reading my letter.
I hope that it helps to somehow get by
till the core of the NilFS2
corruption issue gets solved.

Yours sincerely,
Martin.Vahi@xxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux