On 19. Apr 2018, at 08:33, Kashyap Desai <Kashyap.Desai@xxxxxxxxxxxx> wrote: > > I think you may see issue with 4.6 kernel as well. This is run time memory > allocation failure. Older controller used 32 bit consistence DMA mask, so > possibility of memory allocation failure is high compare to 64 bit > consistence DMA mask. Newer controller has fix in this area, but you are > using gen-1 controller. ("Dell R710, MegaRAID SAS 1078”). Interesting. What is considered old and new? I have a third machine "Dell R515, MegaRAID SAS 2108”, is that considered new? Its running the same Xen/Kernel/Megacli-versions as the other two, but the error does not occur. > There can be a two possibilities. > > 1. This is actual memory allocation failure due to system resource issue. I have not seen any OOMs on the two machines when/where the SGL-error occurs. According to "xl info” and our munin-graphs it all looks ok with a couple 100 MiB “free". > 2. IOCLT provided large memory length in iov and dma buffer allocation > from below API failed due to large memory chunk requested. > > kbuff_arr[i] = dma_alloc_coherent(&instance->pdev->dev, > ioc->sgl[i].iov_len, > &buf_handle, > GFP_KERNEL); > > Can you change driver code *printk* to dump iov_len ? Just to confirm. Just did that on the “Dell R730xd, MegaRAID SAS-3 3108” and get the following output when the megacli works fine. ### Apr 23 09:31:37 xh643 kernel: [ 368.319092] GD IOV-len: 2048 Apr 23 09:31:37 xh643 kernel: [ 368.319426] GD IOV-len: 32 Apr 23 09:31:37 xh643 kernel: [ 368.319563] GD IOV-len: 320 Apr 23 09:31:37 xh643 kernel: [ 368.319698] GD IOV-len: 616 Apr 23 09:31:37 xh643 kernel: [ 368.319887] GD IOV-len: 1664 Apr 23 09:31:37 xh643 kernel: [ 368.320040] GD IOV-len: 32 Apr 23 09:31:37 xh643 kernel: [ 368.320174] GD IOV-len: 8 … ### Full output is attached in iov_len_megacli_works.txt, it also contains the output of /proc/buddyinfo which might be important based in my research so far.
Apr 23 09:31:37 xh643 kernel: [ 368.320306] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.321761] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.321893] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.322075] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.322254] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.322416] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.322549] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.322698] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.322826] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.322970] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.323267] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.323390] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.323534] GD IOV-len: 2048 Apr 23 09:31:37 xh643 kernel: [ 368.323701] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.323841] GD IOV-len: 32 Apr 23 09:31:37 xh643 kernel: [ 368.323968] GD IOV-len: 320 Apr 23 09:31:37 xh643 kernel: [ 368.324142] GD IOV-len: 616 Apr 23 09:31:37 xh643 kernel: [ 368.324299] GD IOV-len: 1664 Apr 23 09:31:37 xh643 kernel: [ 368.324471] GD IOV-len: 32 Apr 23 09:31:37 xh643 kernel: [ 368.324603] GD IOV-len: 8 Apr 23 09:31:37 xh643 kernel: [ 368.324734] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.324854] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.324970] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325085] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325201] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325315] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325430] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325545] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325674] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325793] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.325912] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.326027] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.326150] GD IOV-len: 2048 Apr 23 09:31:37 xh643 kernel: [ 368.326469] GD IOV-len: 384 Apr 23 09:31:37 xh643 kernel: [ 368.326624] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.326745] GD IOV-len: 12 Apr 23 09:31:37 xh643 kernel: [ 368.327243] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.327369] GD IOV-len: 168 Apr 23 09:31:37 xh643 kernel: [ 368.327551] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.328192] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.328485] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.332747] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.332978] GD IOV-len: 384 Apr 23 09:31:37 xh643 kernel: [ 368.333118] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.333247] GD IOV-len: 12 Apr 23 09:31:37 xh643 kernel: [ 368.333621] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.341463] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.341783] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.343730] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.344011] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.346232] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.346542] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.347214] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.347512] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.348174] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.348449] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.349041] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.349304] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.349967] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.350253] GD IOV-len: 256 Apr 23 09:31:37 xh643 kernel: [ 368.356621] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.356858] GD IOV-len: 384 Apr 23 09:31:37 xh643 kernel: [ 368.357015] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.357143] GD IOV-len: 12 Apr 23 09:31:37 xh643 kernel: [ 368.357764] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.359926] GD IOV-len: 24 Apr 23 09:31:37 xh643 kernel: [ 368.360241] GD IOV-len: 512 Apr 23 09:31:37 xh643 kernel: [ 368.362402] GD IOV-len: 24 ### $ cat /proc/buddyinfo Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3 Node 0, zone DMA32 10 6 6 7 5 6 5 5 4 2 475 Node 0, zone Normal 1674 923 272 145 63 38 21 7 3 5 14790
I will follow-up with the iov_len-output once it stops working again. > > One wild guess - You are using Xen flavor, which will reserve less memory > for Dom0 and there may be some way to increase dom0 memory. Can you tune > that as well and see ? I am not sure how to do that in your case, but in > Citrix we used to see such issue frequently compare to *default* Linux. > Providing some tuning in grub increase the dom0 memory and that make > things better compare to default settings. We are using Vanilla-Xen with PVH (basically just "make xenconfig" in kernel-src-dir) and our current setup assigns 4GB of memory to dom0s. I will setup a Dom0 with more just to rule this out.
Attachment:
signature.asc
Description: Message signed with OpenPGP