Hi, There was report at Algolia's blog website regarding some problems using software raid and trim in SSD. https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/ It turns out that there is misunderstanding between raid driver and scsi/ata driver. The raid driver lets split bios share bio vector of source bio. Usually, there is no problem, because the raid layer ensures that the source bio is not freed before the split bios. But, in case of trim, there are some problems. The scsi/ata needs some payloads that include start address and size of device to trim. So, the scsi/ata driver allocates a page and stores that pointer on bio->bi_io_vec->bv_page. (sd_setup_discard_cmnd) Because split bios share the source bio's bi_io_vec, the pointer to the allocated page in scsi/ata driver is overwritten. It leads to memory leakage and data corruption because the overwritten pointer has wrong address and size to trim on device. So, we add some codes that make sure not to share bio vector if the source bio is discard. The linear and raid10 also have same problem because they're using similar scheme to split the source bio. This problem exists at the very first time when the trim is enabled on raid0. Before the new bio_split() (20d0189b), we can patch more easily by modifying the condition to split bio vector in bio_split function from (bi->bi_vec != 0) to ((bi->bi_vec != 0) || (bi->bi_rw & REQ_DISCARD)). Thank you. Seunguk Shin. [?0001-PATCH-raid0-data-corruption-when-using-trim.patch?] >From ca7dbe01fcd2ef2f8cea1a38de5aca5c866c585d Mon Sep 17 00:00:00 2001 From: Seunguk Shin <seunguk.shin@xxxxxxxxxxx> Date: Sat, 18 Jul 2015 20:13:44 +0900 Subject: [PATCH] [PATCH] raid0: data corruption when using trim When we are using software raid and tirm, there is data corruption. The raid driver lets split bios share bio vector of source bio. The scsi/ata driver allocates a page and stores that pointer on bio->bi_io_vec->bv_page (sd_setup_discard_cmnd) because the scsi/ata needs some payloads that include start address and size of device to trim. Because split bios share the source bio's bi_io_vec, the pointer to the allocated page in scsi/ata driver is overwritten. This patch splits bio vector if bio is discard. --- block/bio.c | 6 ------ drivers/md/raid0.c | 13 +++++++++++++ include/linux/bio.h | 6 ++++++ 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2a00d34..df7589d 100644 --- a/block/bio.c +++ b/block/bio.c @@ -32,12 +32,6 @@ #include <trace/events/block.h> /* - * Test patch to inline a certain number of bi_io_vec's inside the bio - * itself, to shrink a bio data allocation from two mempool calls to one - */ -#define BIO_INLINE_VECS 4 - -/* * if you change this list, also change bvec_alloc or things will * break badly! cannot be bigger than what you can fit into an * unsigned short diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index efb654e..fd1318b 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -529,6 +529,19 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio) if (sectors < bio_sectors(bio)) { split = bio_split(bio, sectors, GFP_NOIO, fs_bio_set); + if (unlikely(split->bi_rw & REQ_DISCARD)) { + struct bio_vec *bvl = NULL; + unsigned long idx = BIO_POOL_NONE; + + bvl = bvec_alloc(GFP_NOIO, BIO_INLINE_VECS + 1, + &idx, fs_bio_set->bvec_pool); + split->bi_flags |= 1 << BIO_OWNS_VEC; + split->bi_flags &= ~(BIO_POOL_NONE << + BIO_POOL_OFFSET); + split->bi_flags |= idx << BIO_POOL_OFFSET; + split->bi_max_vecs = BIO_INLINE_VECS + 1; + split->bi_io_vec = bvl; + } bio_chain(split, bio); } else { split = bio; diff --git a/include/linux/bio.h b/include/linux/bio.h index 5e963a6..8344bb5 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -40,6 +40,12 @@ #define BIO_BUG_ON #endif +/* + * Test patch to inline a certain number of bi_io_vec's inside the bio + * itself, to shrink a bio data allocation from two mempool calls to one + */ +#define BIO_INLINE_VECS 4 + #define BIO_MAX_PAGES 256 #define BIO_MAX_SIZE (BIO_MAX_PAGES << PAGE_CACHE_SHIFT) #define BIO_MAX_SECTORS (BIO_MAX_SIZE >> 9) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html