Re: Intermittent stalling of all MD IO, Debian buster (4.19.0-16)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 12, 2021 at 12:41:57PM +0000, Andy Smith wrote:
> Hi,
> 
> I've been experiencing this problem intermittently since December of
> last year after upgrading some existing servers to Debian stable
> (buster). I can't reproduce it at will and it can sometimes take
> several months to happen again, although it has just happened twice
> in 3 days on one host.

I was in a bit of a rush when I dashed that email off. Here's some
more information about the typical configuration of these servers.

$ uname -a
Linux clockwork 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
$ mdadm --version
mdadm - v4.1 - 2018-10-01

Most of these servers have spent about 5 years running on earlier
versions of Debian, notably the full Debian jessie release cycle,
without issue. I've only started having issues after upgrading to
Debian buster.

I will omit details of all member devices as I'm not getting issues
with IO errors, dropouts etc. Most of the servers just have two SATA
SSDs although I am also seeing this on more complex setups.

$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]

md5 : active raid1 sdb5[1] sda5[0]
      3742779392 blocks super 1.2 [2/2] [UU]
      bitmap: 14/28 pages [56KB], 65536KB chunk

md1 : active raid1 sdb1[1] sda1[0]
      975296 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sda2[0] sdb2[1]
      4878336 blocks super 1.2 [2/2] [UU]

md3 : active (auto-read-only) raid1 sdb3[1] sda3[0]
      1951744 blocks super 1.2 [2/2] [UU]

unused devices: <none>

$ sudo smartctl -i /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-16-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG MZ7KH3T8HALS-00005
Serial Number:    S47RNA0MC01657
LU WWN Device Id: 5 002538 e09c88bb3
Firmware Version: HXM7404Q
User Capacity:    3,840,755,982,336 bytes [3.84 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun 12 13:36:47 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

$ sudo smartctl -i /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-16-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG MZ7KH3T8HALS-00005
Serial Number:    S47RNA0MC01656
LU WWN Device Id: 5 002538 e09c88b8a
Firmware Version: HXM7404Q
User Capacity:    3,840,755,982,336 bytes [3.84 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun 12 13:36:54 2021 UTC
Local Time is:    Sat Jun 12 13:36:54 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

At this point it would be really helpful fi Ic ould even narrow it
down to "Xen problem" or "dom0 kernel problem". :(

Cheers,
Andy



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux