On 09.03.2016 16:15, John David Anglin wrote: > On 2016-03-09 9:43 AM, Ming Lei wrote: >>> We've provided all the information you asked for, what's the next step >>> >on this, or do we have to unwind the bio splitting code with reverts >>> >until it starts working? >> John, Helge, and I did discuss the problem for a while privately, and looks >> it is related with compiler. Last time, I sent one patch which can make the >> issue disappear, but the main change is just invovled with the below: >> >> struct bio_vec { >> struct page *bv_page; >> - unsigned int bv_len; >> + unsigned int bv_seg:8; >> + unsigned int bv_len:24; >> unsigned int bv_offset; >> }; >> >> Maybe John and Helge have some update recently? >> >> The logic in blk_bio_segment_split() is correct, and it does respect the max >> segment size limit. > Helge has found that tagging blk_bio_segment_split() with "__attribute__ ((optimize("O0")))" > makes the issue disappear. The bug remains if one just adds bv_len to the struct without the > bit fields. Maybe problem is evident from following output which I sent to Ming and Helge > last weekend? > > blk_rq_map_sg: merge bug: 3 2, extra_len 0, dma_drain 0 > check_bvec: dump bvec for 000000007e4efdc0(f:24490000, t:1) > 0: 0 4096 246503 000000007e4a4f00(0, 94208, 1) > 1: 0 4096 246504 000000007e4a4f00(0, 94208, 1) > 2: 0 4096 246505 000000007e4a4f00(0, 94208, 1) > 3: 0 4096 246506 000000007e4a4f00(0, 94208, 1) > 4: 0 4096 246538 000000007e4a4f00(0, 94208, 2) > 5: 0 4096 246539 000000007e4a4f00(0, 94208, 2) > 6: 0 4096 246540 000000007e4a4f00(0, 94208, 2) > 7: 0 4096 246541 000000007e4a4f00(0, 94208, 2) > 8: 0 4096 246542 000000007e4a4f00(0, 94208, 2) > 9: 0 4096 246543 000000007e4a4f00(0, 94208, 2) > 10: 0 4096 246544 000000007e4a4f00(0, 94208, 2) > 11: 0 4096 246545 000000007e4a4f00(0, 94208, 2) > 12: 0 4096 246546 000000007e4a4f00(0, 94208, 2) > 13: 0 4096 246547 000000007e4a4f00(0, 94208, 2) > 14: 0 4096 246548 000000007e4a4f00(0, 94208, 2) > 15: 0 4096 246549 000000007e4a4f00(0, 94208, 2) > 16: 0 4096 246550 000000007e4a4f00(0, 94208, 2) > 17: 0 4096 246551 000000007e4a4f00(0, 94208, 2) > 18: 0 4096 246552 000000007e4a4f00(0, 94208, 2) > 19: 0 4096 246553 000000007e4a4f00(0, 94208, 2) > 20: 0 4096 246554 000000007e4a4f00(0, 94208, 2) > 21: 0 4096 246555 000000007e4a4f00(0, 94208, 2) > 22: 0 4096 246556 000000007e4a4f00(0, 94208, 2) > Kernel panic - not syncing: bad block merge > > It seems segment 1 is too small and segment 2 too big? > > The general plan is to disable inlining (maybe move blk_bio_segment_split() to a separate > function) to try to figure out what is miscompiled. Right. I just succeeded in reproducing the bug with moving blk_bio_segment_split() into an own file (and with "extern" instead of "static" in blk-merge.c). When compiled with -O2 it still crashes. So, next step is to analyze what gcc does wrong when compiling this function. It should get easier now to find the reason, since we have a smaller reproducer now. Helge > As you say, this is probably a GCC bug. However, it's likely a middle-end or optimization > bug in the common GCC code. > > Dave > -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html