Re: Monitoring for failed drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 25, 2012 at 01:39:51PM +0100, Brian Candler wrote:
> OK, so that's fairly obviously a failed drive.
> 
> The problem is, how to detect and report this?

More specifically, is there a kernel counter I can look at, perhaps
something under /sys, which counts the number of I/O errors when accessing a
block device?  (Recovered and non-recovered?)

Or is the only way to find this sort of error through parsing syslog
messages?

I did find this:

    $ cat /sys/block/sda/stat
      134257    51671 11141686  1337160 60502372 124014063 1476148888 1325166568 0 137287732 1326384944

However there don't seem to be error counters in there. According to
http://www.kernel.org/doc/Documentation/block/stat.txt

    Name            units         description
    ----            -----         -----------
    read I/Os       requests      number of read I/Os processed
    read merges     requests      number of read I/Os merged with in-queue I/O
    read sectors    sectors       number of sectors read
    read ticks      milliseconds  total wait time for read requests
    write I/Os      requests      number of write I/Os processed
    write merges    requests      number of write I/Os merged with in-queue I/O
    write sectors   sectors       number of sectors written
    write ticks     milliseconds  total wait time for write requests
    in_flight       requests      number of I/Os currently in flight
    io_ticks        milliseconds  total time this block device has been active
    time_in_queue   milliseconds  total wait time for all requests

I also found UCD-DISKIO-MIB in net-snmp, but it doesn't have error counters
either:

    diskIOEntry OBJECT-TYPE
        SYNTAX      DiskIOEntry
        MAX-ACCESS  not-accessible
        STATUS      current
        DESCRIPTION
            "An entry containing a device and its statistics."
        INDEX       { diskIOIndex }
        ::= { diskIOTable 1 }

    DiskIOEntry ::= SEQUENCE {
        diskIOIndex         Integer32,
        diskIODevice        DisplayString,
        diskIONRead         Counter32,
        diskIONWritten      Counter32,
        diskIOReads         Counter32,
        diskIOWrites        Counter32,
        diskIONReadX        Counter64,
        diskIONWrittenX     Counter64
    }

Is there anywhere else I should look for this?

Thanks,

Brian.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux