Re: [now urgent] problem with recovered array

eyal@xxxxxxxxxxxxxx · Wed, 1 Nov 2023 17:44:47 +1100

Bugger, I changed my mind. I now see that recordings do show write activity.

The reason why playing a recording from 12h ago does not show any reads is probably
because it is still in the cache, but not dirty. I guess.

My original problem remains though. rsync a large tree of small files to the array
chokes after a minute and after that, writing progresses at a very slow pace
of a few 10s of KB/s. A kthread then starts, runs at 100%CPU for hours even though
only 5 GB were copied.

See my recent thread 'problem with recovered array' for details. The problem is still urgent.
I cannot safely shut down the machine.

On 01/11/2023 15.31, eyal@xxxxxxxxxxxxxx wrote:
Why do I see it as urgent now:
I suspect that my system will soon run out of cache (due to no writes of dirty blocks to the array)
and fail in an ugly way.

If you can help then please read this thread.

Another unusual observation of the situation where the system looks OK but is evidently not.

While pondering the above situation, I noticed that when I watched a mythtv recording,
which is on the array (md127), I see no activity on md127 or the components '/dev/sd[b-g]1'.(*1)

Now, if I watch a program from earlier, from before the last boot(*2), I do see it in iostat.
Is it possible that all the recent recordings are still cached and were never written to disk?

What is holding the system from draining the cache? Is the array somehow in readonly mode? How do I fix this?
I now think that on shutdown, a sync is done that ends up preventing the shutdown from completion.

BTW, before, in the same situation, running 'sync' never completes.

HTH

(*1)
$ df -h /data1
Filesystem      Size  Used Avail Use% Mounted on
/dev/md127       55T   45T  9.8T  83% /data1

]$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sde1[4] sdc1[9] sdf1[5] sdb1[8] sdd1[7] sdg1[6]
       58593761280 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [_UUUUUU]
       bitmap: 88/88 pages [352KB], 65536KB chunk

$ locate 2050_20231031093600.ts
/data1/mythtv/lib/2050_20231031093600.ts

$ ls -l /var/lib/mythtv
lrwxrwxrwx 1 root root 17 Feb 26  2023 /var/lib/mythtv -> /data1/mythtv/lib

$ ls -l /data1/mythtv/lib/2050_20231031093600.ts
-rw-r--r-- 1 mythtv mythtv 2362511964 Oct 31 21:24 /data1/mythtv/lib/2050_20231031093600.ts

(*2)
$ uptime
  15:02:37 up 15:05, 27 users,  load average: 0.49, 0.59, 0.67

On 01/11/2023 08.44, eyal@xxxxxxxxxxxxxx wrote:
On 31/10/2023 00.35, eyal@xxxxxxxxxxxxxx wrote:
F38

I know this is a bit long but I wanted to provide as much detail as I thought needed.

I have a 7-member raid6. The other day I needed to send a disk for replacement.
I have done this before and all looked well. The array is now degraded until I get the new disk.

[trimmed] See original posting.

--
Eyal at Home (eyal@xxxxxxxxxxxxxx)