Re: BUG in sctp crashes sles10sp2 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Vlad,

On Thu, Dec 11, 2008 at 10:28:35AM -0500, Vlad Yasevich wrote:
> Michal Hocko wrote:
> > Hi Vlad,
> > 
> > I am starting this new thread because I am starting to believe that
> > sles10sp2 kernel (based on 2.6.16 upstream kernel) experiences different
> > issue than we can see in the upstream kernel (see bellow).
> > 
> > Karsten (CCing him) has found out following:
> > "
> > OK I think the
> > KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at
> > net/ipv4/af_inet.c (149)
> > 
> > is related to the main problem here, it says that on the time a socket
> > get destroyed here is still some wmem allocated. This mean here is still
> > a transmit skb on the fly. Since sctp use skb destructors to do the
> > memory accounting, this also means that after destroying the socket, the
> > destructor of this skb will access the already freed socket struct,
> > which will let in some cases (if the memory is in use again and the
> > pointers are already overwritten) cause the crash with on
> > {sock_wfree+48} (which is a call to sk->sk_write_space(sk);).  Of course
> > it can crash in every other place, since the accounting may overwrite
> > pointers in any other struct, which reuse this memory.
> > 
> > I instrument some routines with extra debug (eg. inet_sock_destruct) too
> > see the amount of memory in sk->sk_wmem_alloc, it allmost show 
> > 
> > Dec 11 12:31:16 gw kernel: inet_sock_destruct:
> > sk(ffff810116960e00)->sk_wmem_alloc 496
> > Dec 11 12:31:17 gw kernel: inet_sock_destruct:
> > sk(ffff8101144f1b00)->sk_wmem_alloc 496
> > Dec 11 12:31:18 gw kernel: inet_sock_destruct:
> > sk(ffff8101144f1b00)->sk_wmem_alloc -496
> > Dec 11 12:31:20 gw kernel: inet_sock_destruct:
> > sk(ffff81011d461a00)->sk_wmem_alloc 496
> > Dec 11 12:31:21 gw kernel: inet_sock_destruct:
> > sk(ffff81011d460080)->sk_wmem_alloc 496
> > 
> > Note the -496, I think this is a case in which the same memory was again
> > allocated by a socket struct, so the memory still has valid pointers and
> > so on the destructor call for the old socket it did decrement the memory
> > on the new socket.
> > 
> > Do you agree with this analysis ?
> > "
> > 
> > I am trying to go through git logs but maybe you remember some fix in
> > this area.
> > 
> > If I understand correctly, then 20c2df83d25c6a95affe6157a4c9cac4cf5ffaac
> > removes destructors from sctp completely, so the previous should not
> > happen in upstream, shouldn't it?
> > 
> 
> 
> Here are a few commits that you need to check on:
> 
> 61c9fed41638249f8b6ca5345064eb1beb50179f
> [SCTP]: A better solution to fix therace between sctp_peeloff() and sctp_rcv().
> 
> cfdeef3282705a4b872d3559c4e7d2561251363c
> [SCTP]: Unhash the endpoint in sctp_endpoint_free().
> 
> f26f7c480555812ca7c4037e0a50fa54afe2cb4a
> [SCTP]: Add bind hash locking to the migrate code
> 
> 
> All of the above commits address races in the SCTP code and are not in the base
> 2.6.16 kernel.
> 

Thanks for your input.

61c9fed41638249f8b6ca5345064eb1beb50179f
[SCTP]: A better solution to fix therace between sctp_peeloff() and sctp_rcv().

seems to fix this issue, I applied also the other patches.

Now I do not get any longer the "KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed ..." messages.

But now I run into the skb_overflow BUG.
With some extra debug (based on your debug patch) I see:

Possible SKB overflow: packet size = 76, packet overhead = 32, packet chunk = 1/4, chunk len =1040 packet padding 0 nskb len 12 mtu = 1500

packet chunk = 1/4 read as first chunk of total 4 chunks cause the overflow.

First I was thinking that maybe the padding cause this, so I also print this
value, but it is 0 in all traces.

I also applied
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b90a137d30a6322d76023d879d40fc31f3edf0a6

which sound likely to fix such kind of problem, but it seems that we do not
hit this, the bug is still here.


-- 
Karsten Keil
SuSE Labs
ISDN and VOIP development
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux