On Tue, 2013-02-05 at 10:56 -0500, Vlad Yasevich wrote: > On 02/04/2013 06:47 PM, Bob Montgomery wrote: > > > And another: > > > > Under normal operation, an event that fills a hole in the lobby will > > result in a list of events (the new one and sequential ones that had > > been waiting in the lobby) being sent to sctp_ulpq_tail_event(). Then > > we do this: > > /* Check if the user wishes to receive this event. */ > > if (!sctp_ulpevent_is_enabled(event, &sctp_sk(sk)->subscribe)) > > goto out_free; > > > > In out_free, we do > > sctp_queue_purge_ulpevents(skb_list); > > > > > > So if the first event was a notification that we don't subscribe to, > > but the remaining 100 were data, do we really throw out all the > > other data with it?? > > No and for 2 reasons. > 1. sctp_ulpevent_is_enabled only checks for notification events, not > DATA. > 2. Notification events aren't ordered and are always singular. > > So, you will either have all data in the list or a singular notification > that you don't subscribe to. > > -vlad Thanks. That's the kind of insight I was referring to :-) I'll take that worry off my list. We have two different hangs: I believe the first one is caused by this code under these conditions: /* If able to free enough room, accept this chunk. */ if (chunk && (freed >= needed)) { __u32 tsn; tsn = ntohl(chunk->subh.data_hdr->tsn); sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn); sctp_ulpq_tail_data(ulpq, chunk, gfp); sctp_ulpq_partial_delivery(ulpq, chunk, gfp); } If the reasm queue looks like this after we renege to make room for LAST N+4: FIRST N MIDDLE N+1 MIDDLE N+2 MIDDLE N+3 <-- cumulative_tsn_ack <-- base_tsn would be N+4 FIRST N+10 then the call to sctp_ulpq_tail_data(ulpq, chunk, gfp); will reasm and deliver N,N+1,N+2,N+3,N+4 moving the cumulative_ack up to N+4 and leaving N+10 as the only item in the reasm queue. Then we call sctp_ulpq_partial_delivery(ulpq, chunk, gfp); chunk is ignored (then why even there??) and we'll partial deliver FIRST N+10, which is scary because it's ahead of cumulative tsn_ack. And then we set pd_mode = 1. Now FIRST N+5 arrives and is put in the reasm queue by sctp_ulpq_reasm(). But we're in pd_mode = 1 so we call sctp_ulpq_retrieve_partial since N+5 is what we're looking for. But sctp_ulpq_retrieve_partial() will bail when it sees N+5 (A FIRST_FRAG) because it is only looking for MIDDLE_FRAG or LAST_FRAG in the first slot of the reasm queue. And now I think we're stuck forever. ============================================== The second hang does not involve the reasm queue. It occurs on a test where all the events are non-fragmented. The final state of the ulpq lobby is this: SSN X SSN X+1 SSN X+2 SSN X+3 ... SSN X+last And the Next expected SSN value in ssnmap is X+1. So we're waiting on X+1, but X is the first item in the queue. I think that is caused under these conditions: Say the lobby queue had: ssn 10 ssn 10 (duplicate) ssn 11 ssn 12 and we're waiting on ssn 9... call sctp_ulpq_order with event ssn 9: ssn_next is incremented to 10 call sctp_ulpq_retrieve_ordered() start down the list and find 10. ssn_next is incremented to 11. grab ssn 10 off the queue, add to event_list and go around. find 10 again and it's != new ssn_next(11), so break. Now we're hung forever. We built a module with a BUG statement on putting a duplicate into the lobby and hit it. The duplicate event was at the end of a group of sequential events, followed by a gap and then another group of sequential events. Coincidentally (or not), at the time the duplicate was sent to the lobby, it was represented by a lone bit in a word of the tsnmap: ... ssn = 0x30d, tsn = 0x5505020f, ssn = 0x30e, <<<<<<< About to insert this one again tsn = 0x55050210, Big actual gap ssn = 0x378, tsn = 0x5505027a, ssn = 0x379, tsn = 0x5505027b, ... tsn_map = 0xffff8807aa430b80, base_tsn = 0x550501d0, crash-6.0.8bobm> p/x 0x210-0x1d1 $8 = 0x3f So 63 (0x3f) + 1 = 64 bits set, then 106 (0x6a) - 1 = 105 bits clear, then 12 (0xc) + 1 = 13 bits set. crash-6.0.5bobm> rd 0xffff8807aa430b80 8 ffff8807aa430b80: fffffffffffffffe 0000000000000001 ................ ffff8807aa430b90: 007ffc0000000000 0000000000000000 ................ fffffffffffffffe 1 off, 63 on 0000000000000001 1 on , 63 off 007ffc0000000000 42 off, 13 on The lone bit in the second word describes tsn 0x55050210, our duplicate. It might be a 1 in 64 coincidence that our duplicate was the LSB of an otherwise empty 64-bit map word(?) or it might be some map manipulation off by 1 error that maybe cleared the bit of an event we already had, and let the state machine pass another copy?? We need to capture another example of this duplicate entry. More later, Bob Montgomery -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html