Re: Kernel panic w/ message request_threaded_irq -> qla2x00_request_irqs -> qla2x00_probe_one -> mod_timer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2019-04-28 at 12:11 -0400, TomK wrote:
> On 4/15/2019 10:26 PM, TomK wrote:
> > On 4/15/2019 3:35 PM, Laurence Oberman wrote: 
> > > On Mon, 2019-04-15 at 08:39 -0700, Bart Van Assche wrote: 
> > > > On Mon, 2019-04-15 at 08:55 -0400, Laurence Oberman wrote: 
> > > > > On Sun, 2019-04-14 at 23:25 -0400, TomK wrote: 
> > > > > > Hey All, 
> > > > > > 
> > > > > > I'm getting a kernel panic on an Gigabyte GA-890XA-UD3 
> > > > > > motherboard 
> > > > > > that 
> > > > > > I've got a QLE2464 card in as a target (FC).  The kernel
> > > > > > has 
> > > > > > been 
> > > > > > crashing / panicking in the last 1-2 months about once a 
> > > > > > week.  Before 
> > > > > > that, it was rock solid for 4-5 years.  I've upgraded to
> > > > > > kernel 
> > > > > > 4.18.19 
> > > > > > but that hasn't made much of a difference.  Since the
> > > > > > message 
> > > > > > includes 
> > > > > > qla2x00_request_irqs I thought I would try here first. 
> > > > > > 
> > > > > > Tried to get more info on this but: 
> > > > > > 
> > > > > > 1) Keyboard doesn't work and locks up when the panic
> > > > > > occurs.  No 
> > > > > > USB 
> > > > > > ports work.  Tried the PS/2 port but nothing. 
> > > > > > 
> > > > > > 2) Unable to capture a kdump.  Can't get to the kdump
> > > > > > vmcore due 
> > > > > > to 
> > > > > > 1). 
> > > > > > 
> > > > > > The two screenshots is pretty much all I can capture. 
> > > > > > Tried 
> > > > > > things 
> > > > > > like 
> > > > > > clocksource=rtc in the kernel parms and disabling hpet1
> > > > > > but 
> > > > > > apparently I 
> > > > > > haven't disabled it everywhere since it still shows up. 
> > > > > > 
> > > > > > Wondering if anyone recognizes these messages or has any
> > > > > > idea 
> > > > > > what 
> > > > > > could 
> > > > > > be the issue here?  Even a hint would be appreciated. 
> > > > > > 
> > > > >  
> > > > > Hello Tom 
> > > > > I have had similar issues and reported them to 
> > > > > Himanshu@Cavium 
> > > > > I have kept all my target servers at kernel 4.5 as it been
> > > > > the only 
> > > > > version that has always been stable. 
> > > > > If your motherboard has an NMI (virtual or physical) set all
> > > > > of 
> > > > > these 
> > > > > in /etc/sysctl.conf 
> > > > > Run sysctl -a;dracut -f and reboot 
> > > > > 
> > > > > kernel.nmi_watchdog = 1 
> > > > > kernel.panic_on_io_nmi = 1 
> > > > > kernel.panic_on_unrecovered_nmi = 
> > > > > kernel.unknown_nmi_panic = 1 
> > > > > 
> > > > > When the issue shows up press the virtual/physical NMI 
> > > > > 
> > > > > This is with the assumption that generic kdump is properly
> > > > > setup 
> > > > > and 
> > > > > dmesg | grep crash shows memory resrved by the crashkernel
> > > > > and that 
> > > > > you 
> > > > > have tested kdump manually. 
> > > > > 
> > > > > Other options are use a USB serial port to capture the full
> > > > > log if 
> > > > > you 
> > > > > cannot get kdump to work. 
> > > >  
> > > > That approach may provide further evidence about kernel bugs
> > > > but it 
> > > > is not 
> > > > guaranteed that that approach will lead to a solution. It would
> > > > help 
> > > > if 
> > > > either or both of you could do the following on a test system: 
> > > > * Check out branch qla2xxx-for-next of my kernel repo on
> > > > github 
> > > >    (https://github.com/bvanassche/linux/tree/qla2xxx-for-next).
> > > >  
> > > > * Enable lockdep and KASAN in the kernel config
> > > > (CONFIG_PROVE_LOCKING 
> > > > and 
> > > >    CONFIG_KASAN). 
> > > > * Build and install that kernel. 
> > > > * Run your favorite workload. 
> > > > 
> > > > Please note that the qla2xxx-for-next branch is based on the
> > > > v5.1-rc1 
> > > > kernel 
> > > > and hence should not be installed on any production system. 
> > > > 
> > > > Thanks, 
> > > > 
> > > > Bart. 
> > >  
> > > Hello Bart 
> > > OK, I will get to this by Thursday, wont be able to change the 
> > > targetserver kernel until then. 
> > > Regards 
> > > Laurence 
> > > 
> >  
> > Same.  I'll try this out closer to the weekend. 
> > 
> > Not an NMI motherboard.  This is a 9-10 year old AMD board meant as
> > a desktop or home server. 
> > 
> > I'll have to read more about the USB Serial port to capture further
> > info.  That's interesting. 
> > 
> > For the time being, I've disabled HPET in BIOS.  ( Appears the
> > kernel boot parameter method wasn't enough. ) 
> > 
> > 
> 
> Hey Guy's,
> Did some of what you suggested, including the USB serial setup:
> 1) One of DB9 RS232 Serial Null Modem Cable F/F
> 2) Two of USB to RS232 Serial Port DB9 9 Pin Male
> however, when the kernel came down it took the USB support with it
> and so minicom went offline:
>  CTRL-A Z for help |115200 8N1 | NOR | Minicom 2.6.2  | VT102 |     
> Offline
> But I did enable full logging for the QLA module:
> echo 0x7fffffff >
> /sys/module/qla2xxx/parameters/ql2xextended_error_logging
> Did all that, minus the Kernel v5.1-rc1 implementation, and this is
> what was picked up from the minicom USB to Serial capture before
> things went south:
> 1235905 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
> >bufflen=512, cmd->sg_cnt=1, cmd-
> >                                                                    
>            
> dma_data_directi                                                     
>                                                                      
>                              on=1
> se_cmd[0000                                                          
>                                                                      
>                         00009c9ea758]
> qp                                                                   
>                                                                      
>                 0
> 1235906 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
> >bufflen=4096, cmd->sg_cnt=0,
> cmd-                                                                 
>               
> >dma_data_direct                                                     
>                                                                      
>                              ion=2
> se_cmd[000                                                           
>                                                                      
>                        0000096ae11b7]
> q                                                                    
>                                                                      
>               p 0
> 1235907 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
> >bufflen=20480, cmd->sg_cnt=0,
> cmd                                                                  
>              
> ->dma_data_direc                                                     
>                                                                      
>                              tion=2
> se_cmd[00                                                            
>                                                                      
>                      
> 0000001738f793]                                                      
>                                                                      
>                              qp 0
> 1235908 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
> >bufflen=20480, cmd->sg_cnt=0,
> cmd                                                                  
>              
> ->dma_data_direc                                                     
>                                                                      
>                              tion=2
> se_cmd[00                                                            
>                                                                      
>                      
> 000000e8160a90]                                                      
>                                                                      
>                              qp 0
> 1235909 ^MDetected MISCOMPARE for addr: 0000000033045258 buf:
> 00000000f9849912
> 1235910 ^MTarget/fileio: Send MISCOMPARE check condition and sense
> 1235911 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
> >bufflen=512, cmd->sg_cnt=0, cmd-
> >                                                                    
>            
> dma_data_directi                                                     
>                                                                      
>                              on=2
> se_cmd[0000                                                          
>                                                                      
>                         0000363ae214]
> qp                                                                   
>                                                                      
>                 0
> 1235912 ^Mqla2xxx [0000:04:00.0]-e817: Skipping EXPLICIT_CONFORM and
> CTIO7_FLAGS_CONFORM_REQ
> fo                                                                   
>              r FCP READ w/
> no                                                                   
>                                                                      
>                n GOOD status
> 1235913 ^Mqla2xxx [0000:04:00.0]-e874:2: qlt_free_cmd:
> se_cmd[000000001db805fd] ox_id 00c8
> 1235914 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
> qla_target(0): type 6
> ox_id                                                                
>                  00db
> 1235915 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
> qla_target(0): type 6
> ox_id                                                                
>                  00dc
> 1235916 ^Mqla2xxx [0000:04:00.0]-e874:2: qlt_free_cmd:
> se_cmd[00000000f67a701f] ox_id 00c9
> 1235917 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
> qla_target(0): type 6
> ox_id                                                                
>                  00dd
> 1235918 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
> qla_target(0): type 6
> ox_id                                                                
>                  00de
> 
> On an earlier crash, captured the attached image.  This time there
> was nothing on the monitor and the keyboard didn't refresh it.  No
> signal. 
> When looking this up, closest I could see online is the following:
> 
https://target-devel.vger.kernel.narkive.com/XiM5Csx8/luns-become-unavailable-with-current-git-head
> They too run ESXi . 
> To read the file I used the AnsiEsc plugin for VIM: 
> https://www.vim.org/scripts/script.php?script_id=302
> This started to occur once had a VMware based MySQL and PostgreSQL
> cluster configured.  Takes a few days for the issue to occur so from
> that perspective, appears to be memory related.
> Firmware that I'm using is:
>     supported_classes   = "Class 3"
>     supported_speeds    = "1 Gbit, 2 Gbit, 4 Gbit"
>     symbolic_name       = "QLE2464 FW:v8.04.00 DVR:v10.00.00.05-k"
> Targetcli, rtslib and configshell versions I'm using are:
> 
> # rpm -aq|grep -Ei "targetcli|rtslib|configshell" 
> python-rtslib-3.0.pre4.9~g6fd0bbf-1.el6.noarch 
> python-configshell-1.1.fb4-1.el6.noarch 
> targetcli-3.0.pre4.5~ga125182-1.el6.noarch
> 
> 
> -- 
> Thx,
> TK.

I missed this email, Been buried in customer cases.
I also need to still run some tests.
Sorry, reading now




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux