> On 5/13/21 7:15 AM, Theodore Ts'o wrote: > > On Thu, May 13, 2021 at 06:42:22PM +0900, Changheun Lee wrote: > >> > >> Problem might be casued by exhausting of memory. And memory exhausting > >> would be caused by setting of small bio_max_size. Actually it was not > >> reproduced in my VM environment at first. But, I reproduced same problem > >> when bio_max_size is set with 8KB forced. Too many bio allocation would > >> be occurred by setting of 8KB bio_max_size. > > > > Hmm... I'm not sure how to align your diagnosis with the symptoms in > > the bug report. If we were limited by memory, that should slow down > > the I/O, but we should still be making forward progress, no? And a > > forced reboot should not result in data corruption, unless maybe there > > was a missing check for a failed memory allocation, causing data to be > > written to the wrong location, a missing error check leading to the > > block or file system layer not noticing that a write had failed > > (although again, memory exhaustion should not lead to failed writes; > > it might slow us down, sure, but if writes are being failed, something > > is Badly Going Wrong --- things like writes to the swap device or > > writes by the page cleaner must succeed, or else Things Would Go Bad > > In A Hurry). > > After the LUKS data corruption issue was reported I decided to take a > look at the dm-crypt code. In that code I found the following: > > static void clone_init(struct dm_crypt_io *io, struct bio *clone) > { > struct crypt_config *cc = io->cc; > > clone->bi_private = io; > clone->bi_end_io = crypt_endio; > bio_set_dev(clone, cc->dev->bdev); > clone->bi_opf = io->base_bio->bi_opf; > } > [ ... ] > static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size) > { > [ ... ] > clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, &cc->bs); > [ ... ] > clone_init(io, clone); > [ ... ] > for (i = 0; i < nr_iovecs; i++) { > [ ... ] > bio_add_page(clone, page, len, 0); > > remaining_size -= len; > } > [ ... ] > } > > My interpretation is that crypt_alloc_buffer() allocates a bio, > associates it with the underlying device and clones a bio. The input bio > may have a size up to UINT_MAX while the new limit for the size of the > cloned bio is max_sectors * 512. That causes bio_add_page() to fail if > the input bio is larger than max_sectors * 512, hence the data > corruption. Please note that this is a guess only and that I'm not > familiar with the dm-crypt code. > > Bart. We already had problems with too large bios in dm-crypt and we fixed it by adding this piece of code: /* * Check if bio is too large, split as needed. */ if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_VECS << PAGE_SHIFT)) && (bio_data_dir(bio) == WRITE || cc->on_disk_tag_size)) dm_accept_partial_bio(bio, ((BIO_MAX_VECS << PAGE_SHIFT) >> SECTOR_SHIFT)); It will ask the device mapper to split the bio if it is too large. So, crypt_alloc_buffer can't receive a bio that is larger than BIO_MAX_VECS << PAGE_SHIFT. Mikulas