On Wed, Mar 9, 2016 at 11:15 PM, John David Anglin <dave.anglin@xxxxxxxx> wrote: > On 2016-03-09 9:43 AM, Ming Lei wrote: >>> >>> We've provided all the information you asked for, what's the next step >>> >on this, or do we have to unwind the bio splitting code with reverts >>> >until it starts working? >> >> John, Helge, and I did discuss the problem for a while privately, and >> looks >> it is related with compiler. Last time, I sent one patch which can make >> the >> issue disappear, but the main change is just invovled with the below: >> >> struct bio_vec { >> struct page *bv_page; >> - unsigned int bv_len; >> + unsigned int bv_seg:8; >> + unsigned int bv_len:24; >> unsigned int bv_offset; >> }; >> >> Maybe John and Helge have some update recently? >> >> The logic in blk_bio_segment_split() is correct, and it does respect the >> max >> segment size limit. > > Helge has found that tagging blk_bio_segment_split() with "__attribute__ > ((optimize("O0")))" > makes the issue disappear. The bug remains if one just adds bv_len to the > struct without the > bit fields. Maybe problem is evident from following output which I sent to > Ming and Helge > last weekend? > > blk_rq_map_sg: merge bug: 3 2, extra_len 0, dma_drain 0 > check_bvec: dump bvec for 000000007e4efdc0(f:24490000, t:1) > 0: 0 4096 246503 000000007e4a4f00(0, 94208, 1) > 1: 0 4096 246504 000000007e4a4f00(0, 94208, 1) > 2: 0 4096 246505 000000007e4a4f00(0, 94208, 1) > 3: 0 4096 246506 000000007e4a4f00(0, 94208, 1) The above 4 io vectors belong to one same segment since they are contineous physically from the 3rd column of PFN, but the vector 4(the below one) isn't contineous with above 3 vectors, so the following one starts the 2nd segment. > 4: 0 4096 246538 000000007e4a4f00(0, 94208, 2) > 5: 0 4096 246539 000000007e4a4f00(0, 94208, 2) > 6: 0 4096 246540 000000007e4a4f00(0, 94208, 2) > 7: 0 4096 246541 000000007e4a4f00(0, 94208, 2) > 8: 0 4096 246542 000000007e4a4f00(0, 94208, 2) > 9: 0 4096 246543 000000007e4a4f00(0, 94208, 2) > 10: 0 4096 246544 000000007e4a4f00(0, 94208, 2) > 11: 0 4096 246545 000000007e4a4f00(0, 94208, 2) > 12: 0 4096 246546 000000007e4a4f00(0, 94208, 2) > 13: 0 4096 246547 000000007e4a4f00(0, 94208, 2) > 14: 0 4096 246548 000000007e4a4f00(0, 94208, 2) > 15: 0 4096 246549 000000007e4a4f00(0, 94208, 2) > 16: 0 4096 246550 000000007e4a4f00(0, 94208, 2) > 17: 0 4096 246551 000000007e4a4f00(0, 94208, 2) > 18: 0 4096 246552 000000007e4a4f00(0, 94208, 2) > 19: 0 4096 246553 000000007e4a4f00(0, 94208, 2) The above 16 vectors are contineous physically, but the segment size becomes 64K now, so blk_bio_segment_split() should have seen that and started to split the bio. > 20: 0 4096 246554 000000007e4a4f00(0, 94208, 2) > 21: 0 4096 246555 000000007e4a4f00(0, 94208, 2) > 22: 0 4096 246556 000000007e4a4f00(0, 94208, 2) Unfortunately the bio isn't splitted at all, that means the following code is run incorrectly: if (seg_size + bv.bv_len > queue_max_segment_size(q)) goto new_segment; seg_size should be 64K, and bv.bv_len should be 4K, so the sum between the two should be bigger than 64K(queue_max_segment_size(q)). Unfortunately, the code is optimized as run incorrectly. > Kernel panic - not syncing: bad block merge > > It seems segment 1 is too small and segment 2 too big? segment 1 is correct, and segment 2 becomes too big, but __blk_segment_map_sg() runs correctly and figured out this bio has 3 segments, so causes the oops. > > The general plan is to disable inlining (maybe move blk_bio_segment_split() > to a separate > function) to try to figure out what is miscompiled. > > As you say, this is probably a GCC bug. However, it's likely a middle-end > or optimization > bug in the common GCC code. > > Dave > > > -- > John David Anglin dave.anglin@xxxxxxxx > -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html