Re: How to debug intermittent increasing md/inflight but no disk activity?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Roger,


Thank you for your reply.

Am 10.07.24 um 13:54 schrieb Roger Heflin:
How long does it freeze this way?

It froze up to five minutes I’d say.

The disks getting bad blocks do show up as stopping activity for 3-60
seconds (depending on the disks internal settings).

smartctl --xall <device> | grep -iE 'sector|reall' should show the
reallocation counters.

These are SAS disks, and none of the array members has any errors. Example:

```
@grele:~$ sudo smartctl --xall /dev/sdy
[…]
Error counter log:
Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 0 655487.372 0 write: 0 0 0 0 0 38289.771 0
```

What kind of disks does the machine have?

Seagate ST16000NM004J (16 TB, SAS)

On my home machine a bad sector freezes it for 7 seconds (scterc
defaults to 7).  On some work large disk big raid the hang is minutes.
    The raw disk is set to 10 (that is what the vendor told us) and
that 10 + having potentially a bunch of IOs against the bad sector
shows as minutes.

I wrote a script that work uses that both times how long smartctl
takes for each disk (the bad disk takes >5 seconds, and up to minutes)
and also shows the reallocated count and save a copy every hour so one
can see what disk incremented its counter in the last hour and replace
that disk.

A colleague also wrote a Perl program diskcheck, that is regularly run to check all the disks. Nothing suspicious here.


Kind regards,

Paul




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux