From: Danny Shih <dannyshih@xxxxxxxxxxxx> We found out that split bios might handle not in order when a big bio had split by blk_queue_split() and also split in stacking block device, such as md device because chunk size boundary limit. Stacking block device normally use submit_bio_noacct() add the remaining bio to current->bio_list's tail after they split original bio. Therefore, when bio split first time, the last part of bio was add to bio_list. After then, when bio split second time, the middle part of bio was add to bio_list. Results that the middle part is now behind the last part of bio. For example: There is a RAID0 md device, with max_sectors_kb = 2 KB, and chunk_size = 1 KB 1. a read bio come to md device wants to read 0-7 KB 2. In blk_queue_split(), bio split into (0-1), (2-7), and send (2-7) back to md device current->bio_list = bio_list_on_stack[0]: (md 2-7) 3. RAID0 split bio (0-1) into (0) and (1), since chunk size is 1 KB and send (1) back to md device bio_list_on_stack[0]: (md 2-7) -> (md 1) 4. remap and send (0) to lower layer device bio_list_on_stack[0]: (md 2-7) -> (md 1) -> (lower 0) 5. __submit_bio_noacct() sorting bio let lower bio handle firstly bio_list_on_stack[0]: (lower 0) -> (md 2-7) -> (md 1) pop (lower 0) move bio_list_on_stack[0] to bio_list_on_stack[1] bio_list_on_stack[1]: (md 2-7) -> (md 1) 6. after handle lower bio, it handle (md 2-7) firstly, and split in blk_queue_split() into (2-3), (4-7), send (4-7) back bio_list_on_stack[0]: (md 4-7) bio_list_on_stack[1]: (md 1) 7. RAID0 split bio (2-3) into (2) and (3) and send (3) back bio_list_on_stack[0]: (md 4-7) -> (md 3) bio_list_on_stack[1]: (md 1) ... In the end, the split bio handle's order will become 0 -> 2 -> 4 -> 6 -> 7 -> 5 -> 3 -> 1 Reverse the order of same queue bio when sorting bio in __submit_bio_noacct() can solve this issue, but it might influence too much. So we provide alternative version of submit_bio_noacct(), named submit_bio_noacct_add_head(), for the case which need to add bio to the head of current->bio_list. And replace submit_bio_noacct() with submit_bio_noacct_add_head() in block device layer when we want to split bio and send remaining back to itself. Danny Shih (4): block: introduce submit_bio_noacct_add_head block: use submit_bio_noacct_add_head for split bio sending back dm: use submit_bio_noacct_add_head for split bio sending back md: use submit_bio_noacct_add_head for split bio sending back block/blk-core.c | 44 +++++++++++++++++++++++++++++++++----------- block/blk-merge.c | 2 +- block/bounce.c | 2 +- drivers/md/dm.c | 2 +- drivers/md/md-linear.c | 2 +- drivers/md/raid0.c | 4 ++-- drivers/md/raid1.c | 4 ++-- drivers/md/raid10.c | 4 ++-- drivers/md/raid5.c | 2 +- include/linux/blkdev.h | 1 + 10 files changed, 45 insertions(+), 22 deletions(-) -- 2.7.4