On Mon, Dec 15, 2008 at 10:38:56AM -0500, Vlad Yasevich wrote: > Karsten Keil wrote: > > Hi Vlad, > > > > On Thu, Dec 11, 2008 at 10:28:35AM -0500, Vlad Yasevich wrote: > >> Michal Hocko wrote: > >>> Hi Vlad, > >>> > >>> I am starting this new thread because I am starting to believe that > >>> sles10sp2 kernel (based on 2.6.16 upstream kernel) experiences different > >>> issue than we can see in the upstream kernel (see bellow). > >>> > >>> Karsten (CCing him) has found out following: > >>> " > >>> OK I think the > >>> KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at > >>> net/ipv4/af_inet.c (149) > >>> > >>> is related to the main problem here, it says that on the time a socket > >>> get destroyed here is still some wmem allocated. This mean here is still > >>> a transmit skb on the fly. Since sctp use skb destructors to do the > >>> memory accounting, this also means that after destroying the socket, the > >>> destructor of this skb will access the already freed socket struct, > >>> which will let in some cases (if the memory is in use again and the > >>> pointers are already overwritten) cause the crash with on > >>> {sock_wfree+48} (which is a call to sk->sk_write_space(sk);). Of course > >>> it can crash in every other place, since the accounting may overwrite > >>> pointers in any other struct, which reuse this memory. > >>> > >>> I instrument some routines with extra debug (eg. inet_sock_destruct) too > >>> see the amount of memory in sk->sk_wmem_alloc, it allmost show > >>> > >>> Dec 11 12:31:16 gw kernel: inet_sock_destruct: > >>> sk(ffff810116960e00)->sk_wmem_alloc 496 > >>> Dec 11 12:31:17 gw kernel: inet_sock_destruct: > >>> sk(ffff8101144f1b00)->sk_wmem_alloc 496 > >>> Dec 11 12:31:18 gw kernel: inet_sock_destruct: > >>> sk(ffff8101144f1b00)->sk_wmem_alloc -496 > >>> Dec 11 12:31:20 gw kernel: inet_sock_destruct: > >>> sk(ffff81011d461a00)->sk_wmem_alloc 496 > >>> Dec 11 12:31:21 gw kernel: inet_sock_destruct: > >>> sk(ffff81011d460080)->sk_wmem_alloc 496 > >>> > >>> Note the -496, I think this is a case in which the same memory was again > >>> allocated by a socket struct, so the memory still has valid pointers and > >>> so on the destructor call for the old socket it did decrement the memory > >>> on the new socket. > >>> > >>> Do you agree with this analysis ? > >>> " > >>> > >>> I am trying to go through git logs but maybe you remember some fix in > >>> this area. > >>> > >>> If I understand correctly, then 20c2df83d25c6a95affe6157a4c9cac4cf5ffaac > >>> removes destructors from sctp completely, so the previous should not > >>> happen in upstream, shouldn't it? > >>> > >> > >> Here are a few commits that you need to check on: > >> > >> 61c9fed41638249f8b6ca5345064eb1beb50179f > >> [SCTP]: A better solution to fix therace between sctp_peeloff() and sctp_rcv(). > >> > >> cfdeef3282705a4b872d3559c4e7d2561251363c > >> [SCTP]: Unhash the endpoint in sctp_endpoint_free(). > >> > >> f26f7c480555812ca7c4037e0a50fa54afe2cb4a > >> [SCTP]: Add bind hash locking to the migrate code > >> > >> > >> All of the above commits address races in the SCTP code and are not in the base > >> 2.6.16 kernel. > >> > > > > Thanks for your input. > > > > 61c9fed41638249f8b6ca5345064eb1beb50179f > > [SCTP]: A better solution to fix therace between sctp_peeloff() and sctp_rcv(). > > > > seems to fix this issue, I applied also the other patches. > > > > Now I do not get any longer the "KERNEL: assertion > > (!atomic_read(&sk->sk_wmem_alloc)) failed ..." messages. > > > > But now I run into the skb_overflow BUG. > > With some extra debug (based on your debug patch) I see: > > > > Possible SKB overflow: packet size = 76, packet overhead = 32, packet chunk = 1/4, chunk len =1040 packet padding 0 nskb len 12 mtu = 1500 > > > > packet chunk = 1/4 read as first chunk of total 4 chunks cause the overflow. > > OK. It appears that the list of chunk on the packets is not cleaned up > correctly. It could also be that in my prior debug patches I didn't reset > the number of chunks properly. No I think that the debug patch is correct. I already tried some things, so log "the history" of adding chunks in the packet struct, but it does not really help me, but maybe you see something: Possible SKB overflow: packet size = 1072, packet overhead = 32, packet chunk = 2/4, chunk len =1040 packet padding 0 nskb len 1052 mtu = 1500 chunk[1]: ps(36) ss(4) cs(4) ct(11) chunk[2]: ps(48) ss(16) cs(16) ct(3) chunk[3]: ps(1088) ss(1040) cs(1040) ct(0) chunk[4]: ps(1072) ss(1040) cs(1040) ct(0) skb_over_panic: text:ffffffff8824af48 len:2092 put:1040 head:ffff81006b8b8000 data:ffff81006b8b8074 tail:ffff81006b8b88a0 end:ffff81006b8b8500 dev:<NULL> ps packet size after adding this chunk ss chunk->skb->len cs chunk_len ct chunk type I attach the patch at the end, so you see what I did. All traces (with our 2.6.16) are similar to this one (which happend after the above printout). CPU 0 Modules linked in: sctp cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table ipv6 af_packet button battery ac apparmor loop dm_mod floppy r8169 parport_pc Pid: 0, comm: swapper Tainted: G U 2.6.16.60-0.33-smp #13 RIP: 0010:[<ffffffff8028667e>] <ffffffff8028667e>{skb_over_panic+77} RSP: 0018:ffffffff8042fb68 EFLAGS: 00010296 RAX: 000000000000009d RBX: 0000000000000410 RCX: 00000000000046b2 RDX: ffff81006b8b8490 RSI: 0000000000000296 RDI: ffffffff8035a99c RBP: ffff81010845e480 R08: 0000000000000000 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000000 R12: ffff81006bd87bc0 R13: ffff81006b8b8490 R14: ffff810069b3ad50 R15: ffff810069b3ac00 FS: 00002aba700166e0(0000) GS:ffffffff803d4000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002ae33210c000 CR3: 0000000117424000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff803ee000, task ffffffff80354320) Stack: ffff81006b8b88a0 ffff81006b8b8500 ffffffff803197b7 000000000000041c 00000000000005dc ffffffff8824af52 0000000000000286 ffff810069b3ad58 ffff810117a4f000 ffff810108042000 Call Trace: <IRQ> <ffffffff8824af52>{:sctp:sctp_packet_transmit+962} <ffffffff882437c7>{:sctp:sctp_outq_flush+1507} <ffffffff8013d450>{do_timer+61} <ffffffff8010f5f6>{main_timer_handler+641} <ffffffff801198b4>{smp_send_timer_broadcast_ipi+107} <ffffffff88243ab3>{:sctp:sctp_retransmit+116} <ffffffff8010b35e>{ret_from_intr+0} <ffffffff8823baf1>{:sctp:sctp_do_sm+1503} <ffffffff8014a88a>{ktime_get+14} <ffffffff8823c4c7>{:sctp:sctp_generate_t3_rtx_event+0} <ffffffff8823c53c>{:sctp:sctp_generate_t3_rtx_event+117} <ffffffff8013db23>{run_timer_softirq+352} <ffffffff80139774>{__do_softirq+85} <ffffffff8010c222>{call_softirq+30} <ffffffff8010d1a4>{do_softirq+44} <ffffffff80109d41>{default_idle+0} <ffffffff8010bb7c>{apic_timer_interrupt+132} <EOI> <ffffffff80109d41>{default_idle+0} <ffffffff80109d7d>{default_idle+60} <ffffffff80109e57>{cpu_idle+160} <ffffffff803f0832>{start_kernel+626} <ffffffff803f02a2>{_sinittext+674} > > Since this is the same bug that I am trying to diagnose in 2.6.27+, it now > boils down to the same problem. Either the packet is not cleaned up > correctly, or there is a race wrt packet chunk list. > Unfortunately I could not reproduce it with 2.6.27 here (I did run the test overnight, > 12 hour), with our 2.6.16 it triggers on my machine (Dual core AMD 64) in 5 -10 minutes. I also tried 2.6.25, here I could see a similar Oops after 3 hour. I had the idea, that maybe the corruption comes from concurent access to a chunk list, so I did instrument all list operation on the chunklist with special debug spinlocks (use try_lock and print a message if failed), the message was never printed, but it crashed with the same trace, so I think we can rule out this (I used seperate locks for each list_head). > I am currently somewhat off-line due to power issues at my house, but I'll > see if I can get a somewhat cleaned-up debug patch to you. > What a pity, I had some power problems (with leak current triggering a protection device) some weeks ago in my home as well (every few hour it shut down my computers the hard way), very annoying. So if you have any idea, what I can test or should log, I can do this. > -vlad > > > > First I was thinking that maybe the padding cause this, so I also print this > > value, but it is 0 in all traces. > > > > I also applied > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b90a137d30a6322d76023d879d40fc31f3edf0a6 > > > > which sound likely to fix such kind of problem, but it seems that we do not > > hit this, the bug is still here. > > > > -- Karsten Keil SuSE Labs ISDN and VOIP development SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg)
Index: linux/include/net/sctp/structs.h =================================================================== --- linux.orig/include/net/sctp/structs.h +++ linux/include/net/sctp/structs.h @@ -735,6 +735,7 @@ typedef struct sctp_chunk *(sctp_packet_ /* This structure holds lists of chunks as we are assembling for * transmission. */ + struct sctp_packet { /* These are the SCTP header values (host order) for the packet. */ __u16 source_port; @@ -743,7 +744,15 @@ struct sctp_packet { /* This contains the payload chunks. */ struct list_head chunk_list; - + __u32 num_chunks; + __u32 num_pad; +#ifdef SCTP_CHUNK_TRACEXDEBUG +#define SCTP_CHUNK_MAXDEBUG 8 + int dbg_psize[SCTP_CHUNK_MAXDEBUG]; + int dbg_ssize[SCTP_CHUNK_MAXDEBUG]; + int dbg_clen[SCTP_CHUNK_MAXDEBUG]; + int dbg_ctype[SCTP_CHUNK_MAXDEBUG]; +#endif /* This is the overhead of the sctp and ip headers. */ size_t overhead; /* This is the total size of all chunks INCLUDING padding. */ Index: linux/net/sctp/output.c =================================================================== --- linux.orig/net/sctp/output.c +++ linux/net/sctp/output.c @@ -109,6 +109,8 @@ struct sctp_packet *sctp_packet_init(str packet->source_port = sport; packet->destination_port = dport; INIT_LIST_HEAD(&packet->chunk_list); + packet->num_chunks = 0; + packet->num_pad = 0; if (asoc) { struct sctp_sock *sp = sctp_sk(asoc->base.sk); overhead = sp->pf->af->net_header_len; @@ -280,6 +282,15 @@ append: /* It is OK to send this chunk. */ list_add_tail(&chunk->list, &packet->chunk_list); packet->size += chunk_len; +#ifdef SCTP_CHUNK_TRACEXDEBUG + if (packet->num_chunks < SCTP_CHUNK_MAXDEBUG) { + packet->dbg_psize[packet->num_chunks] = packet->size; + packet->dbg_ssize[packet->num_chunks] = chunk->skb->len; + packet->dbg_clen[packet->num_chunks] = chunk_len; + packet->dbg_ctype[packet->num_chunks] = chunk->chunk_hdr->type; + } +#endif + packet->num_chunks += 1; chunk->transport = packet->transport; finish: return retval; @@ -302,6 +313,7 @@ int sctp_packet_transmit(struct sctp_pac int err = 0; int padding; /* How much padding do we need? */ __u8 has_data = 0; + int nr = 0; struct dst_entry *dst; SCTP_DEBUG_PRINTK("%s: packet:%p\n", __FUNCTION__, packet); @@ -374,6 +386,7 @@ int sctp_packet_transmit(struct sctp_pac SCTP_DEBUG_PRINTK("***sctp_transmit_packet***\n"); list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) { list_del_init(&chunk->list); + nr++; if (sctp_chunk_is_data(chunk)) { if (!chunk->has_tsn) { @@ -401,7 +414,23 @@ int sctp_packet_transmit(struct sctp_pac padding = WORD_ROUND(chunk->skb->len) - chunk->skb->len; if (padding) memset(skb_put(chunk->skb, padding), 0, padding); - + packet->num_pad += padding; + if ((nskb->tail + chunk->skb->len) > nskb->end) { + int i; + printk(KERN_EMERG "Possible SKB overflow: packet size = %u, packet overhead = %u, packet chunk = %d/%d, chunk len =%u packet padding %d nskb len %d mtu = %u\n", + packet->size, packet->overhead, nr, packet->num_chunks, chunk->skb->len, packet->num_pad, nskb->len, asoc?asoc->pathmtu:tp->pathmtu); +#ifdef SKB_SOURCE_DEBUG + printk(KERN_EMERG "chunk type %d skb was from %s:%d\n", chunk->chunk_hdr->type, chunk->skb->_file, chunk->skb->_line); +#endif +#ifdef SCTP_CHUNK_TRACEXDEBUG + for (i = 0; i < packet->num_chunks; i++) { + if (i >= SCTP_CHUNK_MAXDEBUG) + break; + printk(KERN_EMERG "chunk[%d]: ps(%d) ss(%d) cs(%d) ct(%d)\n", i+ 1, + packet->dbg_psize[i], packet->dbg_ssize[i], packet->dbg_clen[i], packet->dbg_ctype[i]); + } +#endif + } crc32 = sctp_update_copy_cksum(skb_put(nskb, chunk->skb->len), chunk->skb->data, chunk->skb->len, crc32);