MegaCli fails to communicate with Raid-Controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

after our latest kernel-update from 4.6. to 4.14.14 we are having trouble getting data out of our LSI-raid-controllers using the megacli-binary.

For every execution of the megacli-binary a line shows up in the kern.log

###
[547216.425556] megaraid_sas 0000:03:00.0: Failed to alloc kernel SGL buffer for IOCTL
###

Stracing a megacli-command shows, that ENOMEM is thrown, but thats expected with an error message like above.
###
ioctl(3</dev/megaraid_sas_ioctl_node>, MCE_GET_RECORD_LEN or MTRRIOC_SET_ENTRY, 0x7c98d0) = -1 ENOMEM (Cannot allocate memory)
###

This does not happen on a freshly booted machine. After a reboot it usually takes roughly 2-3 days for the error to occur, but then it stays.
After the first occurrence sometimes, and very randomly a megacli-command works, but only once, then keeps failing again.

Current hardware is
Dell R710, MegaRAID SAS 1078, Debian Jessie, Xen 4.10, Kernel 4.14.14
- virtual disk 1
— 2x 600gb SEAGATE ST3600057SS raid-1
- virtual disk 2
— 4x 2tb SEAGATE ST32000444SS raid-10

Dell R730xd, MegaRAID SAS-3 3108, Debian Jessie, Xen 4.10, Kernel 4.14.14
- same as above

Megaraid-Driver-Version on new 4.14.14 kernel
###
filename:       /lib/modules/4.14.14-2-xen0-he+/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    Avago MegaRAID SAS Driver
author:         megaraidlinux.pdl@xxxxxxxxxxxxx
version:        07.702.06.00-rc1
license:        GPL
srcversion:     15F82F234414CB9CE82AE3D
###

Megaraid-Driver-Version on current 4.6. kernel
###
filename:       /lib/modules/4.4.74-1-xen0-he+/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    Avago MegaRAID SAS Driver
author:         megaraidlinux.pdl@xxxxxxxxxxxxx
version:        06.808.16.00-rc1
license:        GPL
srcversion:     AAF4E2A6BAB0B1254F758CA
###

MegaCli Version
###
$ megacli-perc5 -v
      MegaCLI SAS RAID Management Tool  Ver 8.07.14 Dec 16, 2013
###

It may also be interesting that trying to query all controllers with “-aall” does not seem to find any controller while querying a specific controller exits with an error, even though its definitely there
###
$ megacli-perc5 - -ldpdinfo -aall
Exit Code: 0x00

$ megacli-perc5 -ldpdinfo -a0
User specified controller is not present.
Failed to get CpController object.
Exit Code: 0x01
###

Our monitoring-script runs the following command sequence every 20 minutes:
###
megacli-perc5 -LDGetNum -a0 -NoLog
megacli-perc5 -adpallinfo -a0 -nolog
megacli-perc5 -adpgettime -a0 -nolog
megacli-perc5 -fwtermlog -bbuon -a0 -silent -nolog
megacli-perc5 -adpbbucmd -getbbucapacityinfo -a0 -nolog
megacli-perc5 -ldpdinfo -a0 -nolog
megacli-perc5 -ldinfo -l0 -a0 -nolog
megacli-perc5 -ldinfo -l1 -a0 -nolog
###

I failed to reproduce this on a secondary machine so im looking for clues on how to debug this further. 
I have looked at the kernels git-log, but couldn’t match any change to my problem.
I have looked at the fwtermlog of the controller but theres nothing of interest in there.

Any ideas?

best regards
volker






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux