It has been two months since I last reported the state of the issue:
On 17. 03. 21 16:55, Vojtech Myslivec wrote:
> Thanks a lot Manuel for your findings and information.
>
> I have moved journal from logical volume on RAID1 to a plain partition
> on a SSD and I will monitor the state.
So, we run the MD level 6 array (/dev/md1) with journal device on
a plain partition of one of SSD disk (/dev/sdh5) now. See attached files
for more details.
Since then (March 17th), our discussed issue happened "only" three
times. First occurrence was on April 21st, 5 weeks after moving the journal.
*I can confirm that the issue still persist, but it is definitely less
frequent.*
On 22. 03. 21 18:13, Song Liu wrote:
> Thanks for the information. Quick question, does the kernel have the
> following change?
>
> commit c9020e64cf33f2dd5b2a7295f2bfea787279218a Author: Song
> Liu<songliubraving@xxxxxx> Date: 9 months ago
>
> ...
We run latest available kernel from "Debian backports" distribution
repository, that is Linux version 5.10 currently. I checked that we had
kernel 5.10 as well on March, when I moved the journal.
If I checked it well, this particular patch is part of kernel 5.9
already.
Maybe unrelated, but I noticed this log message just after our "unstuck"
script performed some random I/O operation (just as I described before
in this e-mail thread):
May 2 ... kernel: [2035647.004554] md: md1: data-check done.
I would provide more information if needed. Thanks for any new info.
Vojtech Myslivec
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 7,3T 0 disk
└─md1 9:1 0 29,1T 0 raid6 /mnt/data
sdb 8:16 0 7,3T 0 disk
└─md1 9:1 0 29,1T 0 raid6 /mnt/data
sdc 8:32 0 7,3T 0 disk
└─md1 9:1 0 29,1T 0 raid6 /mnt/data
sdd 8:48 0 7,3T 0 disk
└─md1 9:1 0 29,1T 0 raid6 /mnt/data
sde 8:64 0 7,3T 0 disk
└─md1 9:1 0 29,1T 0 raid6 /mnt/data
sdf 8:80 0 7,3T 0 disk
└─md1 9:1 0 29,1T 0 raid6 /mnt/data
sdg 8:96 1 223,6G 0 disk
├─sdg1 8:97 1 37,3G 0 part
│ └─md0 9:0 0 37,2G 0 raid1
│ ├─vg0-swap 253:0 0 3,7G 0 lvm [SWAP]
│ └─vg0-root 253:1 0 14,9G 0 lvm /
├─sdg2 8:98 1 1K 0 part
├─sdg5 8:101 1 8G 0 part
└─sdg6 8:102 1 178,3G 0 part
sdh 8:112 1 223,6G 0 disk
├─sdh1 8:113 1 37,3G 0 part
│ └─md0 9:0 0 37,2G 0 raid1
│ ├─vg0-swap 253:0 0 3,7G 0 lvm [SWAP]
│ └─vg0-root 253:1 0 14,9G 0 lvm /
├─sdh2 8:114 1 1K 0 part
├─sdh5 8:117 1 8G 0 part
│ └─md1 9:1 0 29,1T 0 raid6 /mnt/data
└─sdh6 8:118 1 178,3G 0 part
/dev/md0:
Version : 1.2
Creation Time : Tue Jan 8 13:16:26 2019
Raid Level : raid1
Array Size : 39028736 (37.22 GiB 39.97 GB)
Used Dev Size : 39028736 (37.22 GiB 39.97 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu May 13 00:17:06 2021
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : resync
Name : backup1:0 (local to host backup1)
UUID : fe06ac67:967c62f7:5ef1b67b:7b951104
Events : 697
Number Major Minor RaidDevice State
0 8 97 0 active sync /dev/sdg1
1 8 113 1 active sync /dev/sdh1
/dev/md1:
Version : 1.2
Creation Time : Wed Apr 3 17:16:20 2019
Raid Level : raid6
Array Size : 31256100864 (29808.14 GiB 32006.25 GB)
Used Dev Size : 7814025216 (7452.04 GiB 8001.56 GB)
Raid Devices : 6
Total Devices : 7
Persistence : Superblock is persistent
Update Time : Thu May 13 00:15:22 2021
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : journal
Name : backup1:1 (local to host backup1)
UUID : fd61cb22:30bfc616:6506829d:9319af95
Events : 2588836
Number Major Minor RaidDevice State
1 8 16 0 active sync /dev/sdb
2 8 0 1 active sync /dev/sda
3 8 32 2 active sync /dev/sdc
4 8 48 3 active sync /dev/sdd
5 8 64 4 active sync /dev/sde
6 8 80 5 active sync /dev/sdf
7 8 117 - journal /dev/sdh5