On Tue, Nov 21, 2023 at 01:55:29AM -0700, Wu Bo wrote: > We found an issue under Android OTA scenario that many BIOs have to do > FEC where the data under dm-verity is 100% complete and no corruption. > > Android OTA has many dm-block layers, from upper to lower: > dm-verity > dm-snapshot > dm-origin & dm-cow > dm-linear > ufs > > Dm tables have to change 2 times during Android OTA merging process. > When doing table change, the dm-snapshot will be suspended for a while. > During this interval, we found there are many readahead IOs are > submitted to dm_verity from filesystem. Then the kverity works are busy > doing FEC process which cost too much time to finish dm-verity IO. And > cause system stuck. > > We add some debug log and find that each readahead IO need around 10s to > finish when this situation occurred. Because here has a IO > amplification: > > dm-snapshot suspend > erofs_readahead // 300+ io is submitted > dm_submit_bio (dm_verity) > dm_submit_bio (dm_snapshot) > bio return EIO > bio got nothing, it's empty > verity_end_io > verity_verify_io > forloop range(0, io->n_blocks) // each io->nblocks ~= 20 > verity_fec_decode > fec_decode_rsb > fec_read_bufs > forloop range(0, v->fec->rsn) // v->fec->rsn = 253 > new_read > submit_bio (dm_snapshot) > end loop > end loop > dm-snapshot resume > > Readahead BIO got nothing during dm-snapshot suspended. So all of them > will do FEC. > Each readahead BIO need to do io->n_blocks ~= 20 times verify. > Each block need to do fec, and every block need to do v->fec->rsn = 253 > times read. > So during the suspend interval(~200ms), 300 readahead BIO make > 300*20*253 IOs on dm-snapshot. > > As readahead IO is not required by user space, and to fix this issue, > I think it would be better to pass it to upper layer to handle it. > > Signed-off-by: Wu Bo <bo.wu@xxxxxxxx> > --- > drivers/md/dm-verity-target.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index 42b2483eb08c..d242e50ec869 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -668,7 +668,9 @@ static void verity_end_io(struct bio *bio) > > verity_fec_init_io(io); > if (bio->bi_status && > - (!verity_fec_is_enabled(io->v) || verity_is_system_shutting_down())) { > + (!verity_fec_is_enabled(io->v) || > + verity_is_system_shutting_down() || > + (bio->bi_opf & REQ_RAHEAD))) { > verity_finish_io(io, bio->bi_status); > return; > } Thanks, this seems reasonable to me. As with your previous patch: what commit introduced this issue? To me this looks like a longstanding issue, maybe dating back to the original addition of FEC support to dm-verity by commit a739ff3f543a ("dm verity: add support for forward error correction"); do you agree? Can you please add Fixes and "Cc stable" tags to your patch? Thanks! - Eric