On Sat, Jun 12, 2021 at 12:41:57PM +0000, Andy Smith wrote: > Hi, > > I've been experiencing this problem intermittently since December of > last year after upgrading some existing servers to Debian stable > (buster). I can't reproduce it at will and it can sometimes take > several months to happen again, although it has just happened twice > in 3 days on one host. I was in a bit of a rush when I dashed that email off. Here's some more information about the typical configuration of these servers. $ uname -a Linux clockwork 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux $ mdadm --version mdadm - v4.1 - 2018-10-01 Most of these servers have spent about 5 years running on earlier versions of Debian, notably the full Debian jessie release cycle, without issue. I've only started having issues after upgrading to Debian buster. I will omit details of all member devices as I'm not getting issues with IO errors, dropouts etc. Most of the servers just have two SATA SSDs although I am also seeing this on more complex setups. $ cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md5 : active raid1 sdb5[1] sda5[0] 3742779392 blocks super 1.2 [2/2] [UU] bitmap: 14/28 pages [56KB], 65536KB chunk md1 : active raid1 sdb1[1] sda1[0] 975296 blocks super 1.2 [2/2] [UU] md2 : active raid1 sda2[0] sdb2[1] 4878336 blocks super 1.2 [2/2] [UU] md3 : active (auto-read-only) raid1 sdb3[1] sda3[0] 1951744 blocks super 1.2 [2/2] [UU] unused devices: <none> $ sudo smartctl -i /dev/sda smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-16-amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: SAMSUNG MZ7KH3T8HALS-00005 Serial Number: S47RNA0MC01657 LU WWN Device Id: 5 002538 e09c88bb3 Firmware Version: HXM7404Q User Capacity: 3,840,755,982,336 bytes [3.84 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Jun 12 13:36:47 2021 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled $ sudo smartctl -i /dev/sdb smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-16-amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: SAMSUNG MZ7KH3T8HALS-00005 Serial Number: S47RNA0MC01656 LU WWN Device Id: 5 002538 e09c88b8a Firmware Version: HXM7404Q User Capacity: 3,840,755,982,336 bytes [3.84 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Jun 12 13:36:54 2021 UTC Local Time is: Sat Jun 12 13:36:54 2021 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled At this point it would be really helpful fi Ic ould even narrow it down to "Xen problem" or "dom0 kernel problem". :( Cheers, Andy