Re: BUG in sctp crashes the system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 21-11-08 09:28:37, Vlad Yasevich wrote:
> Michal
> 
> This really smells like the corruption of the sctp_packet structure.
> The number chunks printout out is 0, but the list appears to have multiple
> entries on it.

I am just wondering that it takes such a long time to trigger on my
system. What can be different (code path?) that corrupts this structure?
Any idea how to trigger it faster? I have tried to increase the number
of servers and clients, but that doesn't seem like it made that crash
faster... 

> 
> Can you turn on CONFIG_DEBUG_LIST and may be even turn on memory
> debugging as well.
> 
> Thanks
> -vlad
> 
> Michal Hocko wrote:
> > On Tue 18-11-08 09:04:58, Vlad Yasevich wrote:
> >> Michal Hocko wrote:
> >>> On Thu 06-11-08 08:48:45, Vlad Yasevich wrote:
> > [...]
> >>>> In the earlier kernels there were a few bugs in the accept code paths that
> >>>> had to do with locking the newly created socket correctly as well as locking
> >>>> the port hash table during the migration of the ports.  Both of those
> >>>> contributed to crashes at odd points in time and sometimes even to stack and
> >>>> memory corruptions.
> >>>>
> >>>> I'll take a look at what's causing skb overflow in 2.6.28.
> >>> Is there any update (patch to test). This is starting to be critical
> >>> from our POV. 
> >>> Do you have any ETA?
> >>> Is there some way how to help here?
> >>>
> >> which version in particular is most critical?
> >>
> >> Just remember then 2.6.16 is very old and there have been a lot of fixes that
> >> address critical issues.
> >>
> >> For 2.6.28, can you apply the attached patch and post dmesg output.  Also, if
> >> it's possible to capture a kdump, that would make things much easier.
> > 
> > I have tried the attached patch and led the machine crash with the
> > 2.6.28-rc5 kernel (4e14e833ac3b97a4aa8803eea49f899adc5bb5f4). Trace as
> > well as config are attached. Kdump vmcore and oldmem along with vmlinux
> > and System.map can be found at:
> > 
> > ftp.novell.com/outgoing/vmcore.2.6.28-rc5-sctp.gz
> > ftp.novell.com/outgoing/oldmem.2.6.28-rc5-sctp.gz
> > ftp.novell.com/outgoing/vmlinux-2.6.28-rc5-sctp.gz
> > ftp.novell.com/outgoing/System.map-2.6.28-rc5-sctp.gz
> > 
> > md5sums:
> > d43a09b384c6b45ffd0615fd2f3e63e7  vmcore.2.6.28-rc5-sctp
> > f0e327c1b58c84f0ed7006fc5b881bd8  oldmem.2.6.28-rc5-sctp
> > 70f86806415a266dccb13dae835b8d0e  vmlinux-2.6.28-rc5-sctp
> > 41bb6d07ec960557f8243eb98b244c9b  System.map-2.6.28-rc5-sctp
> > 
> > Unfortunately, I don't have timing information in the captured trace
> > (logs don't contain anything), so it is not clear how much time elapsed
> > between debug output added by the patch and the crash itself.
> > 
> > "sky2 lan: rx error, status 0x1160002 length 278" was logged at Nov 18
> > 16:59:25 (around hour after test has started) while the crash has
> > occured around Nov 19 1:30 
> > /var/log/messages:
> > [...]
> > Nov 19 00:31:05 dhcp35 -- MARK --
> > Nov 19 00:51:05 dhcp35 -- MARK --
> > Nov 19 01:11:05 dhcp35 -- MARK --
> > Nov 19 01:31:05 dhcp35 -- MARK --
> > Nov 19 09:37:15 dhcp35 syslogd 1.5.0#5: restart.
> > 
> > 
> 

-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux