On 07/12/2013 10:56 AM, Jan Beulich wrote: >>>> On 12.07.13 at 10:32, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: >> On Mon, Jul 08, 2013 at 03:20:26PM +0100, Jan Beulich wrote: >>>>>> On 08.07.13 at 11:59, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: >>>>>>> On 05.07.13 at 16:53, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: >>>>> skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; >>>>> skb->len += skb->data_len; >>>>> } >>>>> >>>>> handle_incoming_packet(); >>>>> >>>>> You seem to be altering the behavior of the original code, because in >>>>> your patch the skb->len is incremented before use, while in the original >>>>> code (which calls skb_headlen in handle_incoming_packet) the skb->len is >>>>> correctly set. >>>> >>>> Right. So I basically need to keep skb->len up-to-date along with >>>> ->data_len. Just handed a patch to Dion with that done; I'll defer >>>> sending a v2 for the upstream code until I know the change works >>>> for our kernel. >>> >>> Okay, so with that done (see below) Dion is now seeing the >>> WARN_ON_ONCE(delta < len) in skb_try_coalesce() triggering. Of >>> course, with it having crashed before, it's hard to tell whether the >>> triggering now is an effect of the patch, or just got unmasked by it. >>> >> >> I just ported your below patch to upstream kernel and I didn't see the >> WARN_ON_ONCE. I only did iperf and netperf tests. >> >> If the work load to trigger this bug is simple enough I can give it a >> shot... > > I'm meanwhile relatively convinced that the warning isn't an effect > of the patch (final verification pending); I intend to submit v2 as > soon as 3.11-rc1 is out. > > Jan > I have been doing some testing and focussed on xennet_fill_frags. I enhanced a bit the detection of the skb->data_len > nr_frags * PAGE_SIZE condition (as suggested by Jan) by storing rx->status and nr_frags as function of each dequeued skb in the while loop. There is something wrong with the part +if (nr_frags == MAX_SKB_FRAGS) { + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; + + BUG_ON(pull_to <= skb_headlen(skb)); + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); + nr_frags = shinfo->nr_frags; +} +BUG_ON(nr_frags >= MAX_SKB_FRAGS); When nr_frags reaches MAX_SKB_FRAGS (and this happens), nr_frags is "reset" to shinfo->nr_frags. In fact I see the nr_frags set to 0 the first time nr_frags reaches MAX_SKB_FRAGS. The first problem with this is of course that the "skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags" calculation following the "i = xennet_fill_frags(np, skb, &tmpq)" in neif_poll leads to a wrong result. At the end the skb has ->truesize way smaller than ->len. The second problem is that the BUG_ON(nr_frags >= MAX_SKB_FRAGS) shall not occur, since "shinfo->nr_frags" afaict, is not updated in between. Furthermore, I cannot figure out why, when control enters xennet_fill_frags(), shinfo->nr_frags equals 1, and a little later when nr_frags reaches MAX_SKB_FRAGS, it is 0. Here is the (printk) result of a trial: > 1 2013-07-13T04:27:30.757241+02:00 doma kernel: [ 41.759528] netfront: Too many frags > 2 2013-07-13T04:27:39.629261+02:00 doma kernel: [ 50.628482] 17 == 17 (shinfo->nr_frags=0) > 3 2013-07-13T04:27:39.629285+02:00 doma kernel: [ 50.628486] j=0,rx_trace[j]=48,nr_frags_trace[j]=1 > 4 2013-07-13T04:27:39.629287+02:00 doma kernel: [ 50.628488] j=1,rx_trace[j]=1000,nr_frags_trace[j]=2 > 5 2013-07-13T04:27:39.629289+02:00 doma kernel: [ 50.628489] j=2,rx_trace[j]=1000,nr_frags_trace[j]=3 > 6 2013-07-13T04:27:39.629291+02:00 doma kernel: [ 50.628491] j=3,rx_trace[j]=1000,nr_frags_trace[j]=4 > 7 2013-07-13T04:27:39.629293+02:00 doma kernel: [ 50.628492] j=4,rx_trace[j]=1000,nr_frags_trace[j]=5 > 8 2013-07-13T04:27:39.629294+02:00 doma kernel: [ 50.628493] j=5,rx_trace[j]=1000,nr_frags_trace[j]=6 > 9 2013-07-13T04:27:39.629296+02:00 doma kernel: [ 50.628494] j=6,rx_trace[j]=1000,nr_frags_trace[j]=7 > 10 2013-07-13T04:27:39.629298+02:00 doma kernel: [ 50.628496] j=7,rx_trace[j]=1000,nr_frags_trace[j]=8 > 11 2013-07-13T04:27:39.629300+02:00 doma kernel: [ 50.628497] j=8,rx_trace[j]=1000,nr_frags_trace[j]=9 > 12 2013-07-13T04:27:39.629302+02:00 doma kernel: [ 50.628498] j=9,rx_trace[j]=1000,nr_frags_trace[j]=10 > 13 2013-07-13T04:27:39.629304+02:00 doma kernel: [ 50.628499] j=10,rx_trace[j]=1000,nr_frags_trace[j]=11 > 14 2013-07-13T04:27:39.629305+02:00 doma kernel: [ 50.628503] j=11,rx_trace[j]=1000,nr_frags_trace[j]=12 > 15 2013-07-13T04:27:39.629307+02:00 doma kernel: [ 50.628504] j=12,rx_trace[j]=1000,nr_frags_trace[j]=13 > 16 2013-07-13T04:27:39.629328+02:00 doma kernel: [ 50.628506] j=13,rx_trace[j]=1000,nr_frags_trace[j]=14 > 17 2013-07-13T04:27:39.629330+02:00 doma kernel: [ 50.628507] j=14,rx_trace[j]=1000,nr_frags_trace[j]=15 > 18 2013-07-13T04:27:39.629332+02:00 doma kernel: [ 50.628508] j=15,rx_trace[j]=1000,nr_frags_trace[j]=16 > 19 2013-07-13T04:27:39.629335+02:00 doma kernel: [ 50.628510] j=16,rx_trace[j]=e40,nr_frags_trace[j]=0 > 20 2013-07-13T04:27:39.629336+02:00 doma kernel: [ 50.628511] xennet::xennet_fill_frags (fe88,feca,500,1,e40,42,42) line 2 comes from if (nr_frags == MAX_SKB_FRAGS) { unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; BUG_ON(pull_to <= skb_headlen(skb)); __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); + printk("%u == %lu (shinfo->nr_frags=%u)\n",//temp + nr_frags,MAX_SKB_FRAGS, shinfo->nr_frags);//tmp nr_frags = shinfo->nr_frags; } Lines 3 to 19 show the dumping I added at the end of the while loop +if (skb->data_len > nr_frags * PAGE_SIZE) {//tmp + for (j=0; j<i; ++j)//tmp + printk("j=%u,rx_trace[j]=%x,nr_frags_trace[j]=%u\n", //tmp + j,rx_trace[j],nr_frags_trace[j]);//tmp + printk("xennet::xennet_fill_frags (%x,%x,%x,%u,%x,%x,%x)\n",//temp + skb->data_len, skb->len, skb->truesize, nr_frags,//tmp + rx->status, len0, data_len0);//temp +} For gathering the data I added a couple of variables on the stack struct skb_shared_info *shinfo = skb_shinfo(skb); int nr_frags = shinfo->nr_frags; RING_IDX cons = np->rx.rsp_cons; struct sk_buff *nskb; +unsigned rx_trace[MAX_SKB_FRAGS+1], i=0,j, len0=0, data_len0=0;//tmp +int nr_frags_trace[MAX_SKB_FRAGS+1];//tmp and in the while loop __skb_fill_page_desc(skb, nr_frags, skb_frag_page(skb_shinfo(nskb)->frags), rx->offset, rx->status); +if (i==0) {//tmp + len0=skb->len;//tmp + data_len0=skb->data_len;//tmp +}//tmp +rx_trace[i] = rx->status;//tmp skb->len += rx->status; skb->data_len += rx->status; skb_shinfo(nskb)->nr_frags = 0; kfree_skb(nskb); +nr_frags_trace[i++] = nr_frags;//tmp nr_frags++; As a coincidence I noticed that there has to be a subtle condition related to speed of the while loop required to reach the state where nr_frags becomes MAX_SKB_FRAGS. A bit more delay in the while and it does not occur anymore. E.g. first I had the gathering ordered like +if (i==0) {//tmp + len0=skb->len;//tmp + data_len0=skb->data_len;//tmp +}//tmp +rx_trace[i] = rx->status;//tmp +nr_frags_trace[i++] = nr_frags;//tmp skb->len += rx->status; skb->data_len += rx->status; skb_shinfo(nskb)->nr_frags = 0; kfree_skb(nskb); nr_frags++; and I was unable to find nr_frags reaching MAX_SKB_FRAGS within a finite testing time. I thought this may be worthwhile to share this. As a workload to trigger this I use simply dom0-t:~ # scp bigrand.bin doma:/dev/null and with this on my system the transfer rate settles around 45 MB/s limited by a saturated core which runs the user land ssh process. Dion -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html