Good morning Krzysztof,
On 9/3/19 7:38 AM, Krzysztof Jakóbczyk wrote:
Hello Neil,
Many thanks for your input! The support you guys are providing is one of a kind!
I've been able to kill some of the blocked processes, releasing the
locked files in the `/data` mount point, but some of them remained
locked.
I've booted with SystemRescueCD and it automatically detected and
assembled the array as md127. The array was in read-only state, but
after mounting it in SystemRescueCD `/mnt` the reshape process started
from begining. Right now the `/proc/mdstat` looks as follows:
"auto-read-only" is a standard state for an anonymous array (the
thumbdrive wouldn't have an mdadm.conf file that explicitly identified
your array.)
You could have used --run to get the reshape going instead of mounting.
Mounting is not always safe in these situations.
[root@sysresccd ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sda1[5] sdg1[6] sdd1[4] sdf1[3]
7813771264 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
[=================>...] reshape = 88.1% (3444089344/3906885632)
finish=189.5min speed=40688K/sec
bitmap: 8/30 pages [32KB], 65536KB chunk
unused devices: <none>
Initially the speed was reaching 63M/sec, but now it's a bit slower,
however still at a very good level.
Not surprising. You are in a very good place right now...
[trim /]
The question that still remains is: when the reshape in SystemRescueCD
finishes, can I safely mount the array in the outdated host and
perform mdadm and system updates?
Yes.
Also, what I've found in `dmesg` is quite distressing:
No, not distressing. The key is "corrected". In other words, MD is
handling normal read errors as only a redundant array can -- by
reconstructing the missing data and writing it back where it belongs.
Unrecoverable read errors are normal. (Search the archives on this
topic for many discussions on the why and the mitigations.)
The fact you are getting several during a reshape suggests that your
system has not been doing regular "check" scrubs, which keep these
cleaned up. Most modern distros have a weekly cron job that kicks off
the necessary tasks. Self-tests within the drives are *not* a
substitute for regular scrubs. Self-tests can only *find* problem
sectors, not *fix* them.
[ 7234.974190] perf: interrupt took too long (2517 > 2500), lowering
kernel.perf_event_max_sample_rate to 79400
Uhm? Why are you running perf during this reshape?
What do you think of that? The `smartctl -a` for the reported drives
is not showing anything unusual and the drives are new, so it
shouldn't be a hardware problem (still not noticed by SMART):
Sectors go bad. Randomly, but not often. The bad sectors cannot be
detected until they are read. They cannot be fixed without writing to them.
Detected but unfixed show in smartctl as "Pending Sectors". The fix may
or may not involve relocation. Relocation on new drives is rare.
[trim /]
Can I see if the filest at those reported sectors are correct?
They were corrected. You can work backwards with your filesystem's
tools to find the file or inode there, if any, but MD would have no
information about the "correctness" of the files. It would be your
judgement.
Best regards,
Krzysztof Jakobczyk
Regards,
Phil
ps. Please avoid top-posting, and *do* trim your replies. {Standard
list etiquette for kernel.org.} There's nothing more annoying in a
properly threaded email client than gobs of unnecessary quoted material.