Michal Hocko wrote: > On Fri 21-11-08 09:28:37, Vlad Yasevich wrote: >> Michal >> >> This really smells like the corruption of the sctp_packet structure. >> The number chunks printout out is 0, but the list appears to have multiple >> entries on it. > > I am just wondering that it takes such a long time to trigger on my > system. What can be different (code path?) that corrupts this structure? > Any idea how to trigger it faster? I have tried to increase the number > of servers and clients, but that doesn't seem like it made that crash > faster... > Michal Could you make your reproducer available? I'd like to see if it uses any protocol extensions. Also, can you provide the output of "sysctl -a | grep sctp". Also to check if any extensions are enabled. Just trying to narrow down the what to look for. Thanks -vlad Can you turn on CONFIG_DEBUG_LIST and may be even turn on memory >> debugging as well. >> >> Thanks >> -vlad >> >> Michal Hocko wrote: >>> On Tue 18-11-08 09:04:58, Vlad Yasevich wrote: >>>> Michal Hocko wrote: >>>>> On Thu 06-11-08 08:48:45, Vlad Yasevich wrote: >>> [...] >>>>>> In the earlier kernels there were a few bugs in the accept code paths that >>>>>> had to do with locking the newly created socket correctly as well as locking >>>>>> the port hash table during the migration of the ports. Both of those >>>>>> contributed to crashes at odd points in time and sometimes even to stack and >>>>>> memory corruptions. >>>>>> >>>>>> I'll take a look at what's causing skb overflow in 2.6.28. >>>>> Is there any update (patch to test). This is starting to be critical >>>>> from our POV. >>>>> Do you have any ETA? >>>>> Is there some way how to help here? >>>>> >>>> which version in particular is most critical? >>>> >>>> Just remember then 2.6.16 is very old and there have been a lot of fixes that >>>> address critical issues. >>>> >>>> For 2.6.28, can you apply the attached patch and post dmesg output. Also, if >>>> it's possible to capture a kdump, that would make things much easier. >>> I have tried the attached patch and led the machine crash with the >>> 2.6.28-rc5 kernel (4e14e833ac3b97a4aa8803eea49f899adc5bb5f4). Trace as >>> well as config are attached. Kdump vmcore and oldmem along with vmlinux >>> and System.map can be found at: >>> >>> ftp.novell.com/outgoing/vmcore.2.6.28-rc5-sctp.gz >>> ftp.novell.com/outgoing/oldmem.2.6.28-rc5-sctp.gz >>> ftp.novell.com/outgoing/vmlinux-2.6.28-rc5-sctp.gz >>> ftp.novell.com/outgoing/System.map-2.6.28-rc5-sctp.gz >>> >>> md5sums: >>> d43a09b384c6b45ffd0615fd2f3e63e7 vmcore.2.6.28-rc5-sctp >>> f0e327c1b58c84f0ed7006fc5b881bd8 oldmem.2.6.28-rc5-sctp >>> 70f86806415a266dccb13dae835b8d0e vmlinux-2.6.28-rc5-sctp >>> 41bb6d07ec960557f8243eb98b244c9b System.map-2.6.28-rc5-sctp >>> >>> Unfortunately, I don't have timing information in the captured trace >>> (logs don't contain anything), so it is not clear how much time elapsed >>> between debug output added by the patch and the crash itself. >>> >>> "sky2 lan: rx error, status 0x1160002 length 278" was logged at Nov 18 >>> 16:59:25 (around hour after test has started) while the crash has >>> occured around Nov 19 1:30 >>> /var/log/messages: >>> [...] >>> Nov 19 00:31:05 dhcp35 -- MARK -- >>> Nov 19 00:51:05 dhcp35 -- MARK -- >>> Nov 19 01:11:05 dhcp35 -- MARK -- >>> Nov 19 01:31:05 dhcp35 -- MARK -- >>> Nov 19 09:37:15 dhcp35 syslogd 1.5.0#5: restart. >>> >>> > -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html