On 05/27/15 18:50, Junichi Nomura wrote: > On 05/27/15 17:21, Christoph Hellwig wrote: >> On Tue, May 26, 2015 at 06:20:43AM +0000, Junichi Nomura wrote: >>> Not completing bios is not sufficient. >>> If you advance the bi_iter to the end, you need to somehow rewind it >>> or the re-submission will be incomplete, that would end up as a data >>> corruption... Less critical than the data corruption issue, I'm also worried about partial completion case. For successful partial completion, current code completes bio before fully completing the request. Your patch changes bios not completed until the request is fully completed. Other related concern is partial failure. In the case of bad sector, for example, current code fails I/O for the particular sector but other sectors in the request succeeds. If you make the request completion as all-or-nothing model, that will be a degrade for such a case. I'm not very sure how much impact does the removal of partial completion have in the real world. If partial completion is so negligible, I think it should be handled in such a way all the cases, instead of special casing REQ_CLONE. >> Can you explain which particular case you're worried about? > > General path failure case. > > On retrying, another clone is created but bios it points to > are already advanced to the end with your patch. > So they look like bios with no remaining segments. > Lower driver may successfully completes such a resubmitted > clone *without doing actual I/O*. > Then written data will be lost / read data will be bogus. > > Can you test this scenario with your patch? > 1. Set up a multipath device with fail-over mode > 2. Write something to the multipath device. > After the clone request is sent to the primary path > and before the data goes to the disk, > down the primary path > (e.g. echo offline > /sys/block/sdXX/device/state) > 3. (dm-mpath will retry from the secondary path and > the write will eventually succeed) > 4. Verify if the written data is really on the disk I made a small script so that people can play with. The script sets up tcm_loop multipath device and fio verification test while repeating paths up and down quickly. When your patch is applied, fio reports verification failure within a minute like this: # ./stress-mp.sh .. test1: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=2 fio-2.2.8-16-g68d9 Starting 1 process meta: verify failed at file /dev/mapper/mp offset 477626368, length 524288 received data dumped as mp.477626368.received expected data dumped as mp.477626368.expected fio: pid=13560, err=84/file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character test1: (groupid=0, jobs=1): err=84 (file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=13560: Thu May 28 01:54:56 2015 -- Jun'ichi Nomura, NEC Corporation
Attachment:
stress-mp.sh
Description: stress-mp.sh
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel