Re: MegaCli fails to communicate with Raid-Controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19. Apr 2018, at 08:33, Kashyap Desai <Kashyap.Desai@xxxxxxxxxxxx> wrote:
> 
> I think you may see issue with 4.6 kernel as well. This is run time memory
> allocation failure. Older controller used 32 bit consistence DMA mask, so
> possibility of memory allocation failure is high compare to 64 bit
> consistence DMA mask. Newer controller has fix in this area, but you are
> using gen-1 controller. ("Dell R710, MegaRAID SAS 1078”).

Interesting. What is considered old and new? I have a third machine "Dell R515, MegaRAID SAS 2108”, is that considered new? Its running the same Xen/Kernel/Megacli-versions as the other two, but the error does not occur.

> There can be a two possibilities.
> 
> 1. This is actual memory allocation failure due to system resource issue.

I have not seen any OOMs on the two machines when/where the SGL-error occurs. According to "xl info” and our munin-graphs it all looks ok with a couple 100 MiB “free".


> 2. IOCLT provided large memory length in iov and dma buffer allocation
> from below API failed due to large memory chunk requested.
> 
>                kbuff_arr[i] = dma_alloc_coherent(&instance->pdev->dev,
>                                                    ioc->sgl[i].iov_len,
>                                                    &buf_handle,
> GFP_KERNEL);
> 
> Can you change driver code *printk* to dump iov_len ? Just to confirm.

Just did that on the “Dell R730xd, MegaRAID SAS-3 3108” and get the following output when the megacli works fine.

###
Apr 23 09:31:37 xh643 kernel: [  368.319092] GD IOV-len: 2048
Apr 23 09:31:37 xh643 kernel: [  368.319426] GD IOV-len: 32
Apr 23 09:31:37 xh643 kernel: [  368.319563] GD IOV-len: 320
Apr 23 09:31:37 xh643 kernel: [  368.319698] GD IOV-len: 616
Apr 23 09:31:37 xh643 kernel: [  368.319887] GD IOV-len: 1664
Apr 23 09:31:37 xh643 kernel: [  368.320040] GD IOV-len: 32
Apr 23 09:31:37 xh643 kernel: [  368.320174] GD IOV-len: 8
…
###

Full output is attached in iov_len_megacli_works.txt, it also contains the output of /proc/buddyinfo which might be important based in my research so far.

Apr 23 09:31:37 xh643 kernel: [  368.320306] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.321761] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.321893] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.322075] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.322254] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.322416] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.322549] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.322698] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.322826] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.322970] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.323267] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.323390] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.323534] GD IOV-len: 2048
Apr 23 09:31:37 xh643 kernel: [  368.323701] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.323841] GD IOV-len: 32
Apr 23 09:31:37 xh643 kernel: [  368.323968] GD IOV-len: 320
Apr 23 09:31:37 xh643 kernel: [  368.324142] GD IOV-len: 616
Apr 23 09:31:37 xh643 kernel: [  368.324299] GD IOV-len: 1664
Apr 23 09:31:37 xh643 kernel: [  368.324471] GD IOV-len: 32
Apr 23 09:31:37 xh643 kernel: [  368.324603] GD IOV-len: 8
Apr 23 09:31:37 xh643 kernel: [  368.324734] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.324854] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.324970] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325085] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325201] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325315] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325430] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325545] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325674] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325793] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.325912] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.326027] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.326150] GD IOV-len: 2048
Apr 23 09:31:37 xh643 kernel: [  368.326469] GD IOV-len: 384
Apr 23 09:31:37 xh643 kernel: [  368.326624] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.326745] GD IOV-len: 12
Apr 23 09:31:37 xh643 kernel: [  368.327243] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.327369] GD IOV-len: 168
Apr 23 09:31:37 xh643 kernel: [  368.327551] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.328192] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.328485] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.332747] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.332978] GD IOV-len: 384
Apr 23 09:31:37 xh643 kernel: [  368.333118] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.333247] GD IOV-len: 12
Apr 23 09:31:37 xh643 kernel: [  368.333621] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.341463] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.341783] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.343730] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.344011] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.346232] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.346542] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.347214] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.347512] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.348174] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.348449] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.349041] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.349304] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.349967] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.350253] GD IOV-len: 256
Apr 23 09:31:37 xh643 kernel: [  368.356621] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.356858] GD IOV-len: 384
Apr 23 09:31:37 xh643 kernel: [  368.357015] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.357143] GD IOV-len: 12
Apr 23 09:31:37 xh643 kernel: [  368.357764] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.359926] GD IOV-len: 24
Apr 23 09:31:37 xh643 kernel: [  368.360241] GD IOV-len: 512
Apr 23 09:31:37 xh643 kernel: [  368.362402] GD IOV-len: 24

###

$ cat /proc/buddyinfo
Node 0, zone      DMA      0      0      0      1      2      1      1      0      1      1      3
Node 0, zone    DMA32     10      6      6      7      5      6      5      5      4      2    475
Node 0, zone   Normal   1674    923    272    145     63     38     21      7      3      5  14790
I will follow-up with the iov_len-output once it stops working again.


> 
> One wild guess -  You are using Xen flavor, which will reserve less memory
> for Dom0 and there may be some way to increase dom0 memory. Can you tune
> that as well and see ? I am not sure how to do that in your case, but in
> Citrix we used to see such issue frequently compare to *default* Linux.
> Providing some tuning in grub increase the dom0 memory and that make
> things better compare to default settings.

We are using Vanilla-Xen with PVH (basically just "make xenconfig" in kernel-src-dir) and our current setup assigns 4GB of memory to dom0s. I will setup a Dom0 with more just to rule this out.


Attachment: signature.asc
Description: Message signed with OpenPGP


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux