On 16/11/2024 03:50, Yu Kuai wrote:
Hi,
在 2024/11/16 2:19, Song Liu 写道:
On Tue, Nov 12, 2024 at 4:43 AM John Garry <john.g.garry@xxxxxxxxxx>
wrote:
Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.
For an attempt to atomic write to a region which has bad blocks, error
the write as we just cannot do this. It is unlikely to find devices
which
support atomic writes and bad blocks.
Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
drivers/md/raid10.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
Reviewed-by: Yu Kuai <yukuai3@xxxxxxxxxx>
One nit below:
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8c7f5daa073a..a3936a67e1e8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1255,6 +1255,7 @@ static void raid10_write_one_disk(struct mddev
*mddev, struct r10bio *r10_bio,
const enum req_op op = bio_op(bio);
const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC;
const blk_opf_t do_fua = bio->bi_opf & REQ_FUA;
+ const blk_opf_t do_atomic = bio->bi_opf & REQ_ATOMIC;
unsigned long flags;
struct r10conf *conf = mddev->private;
struct md_rdev *rdev;
@@ -1273,7 +1274,7 @@ static void raid10_write_one_disk(struct mddev
*mddev, struct r10bio *r10_bio,
mbio->bi_iter.bi_sector = (r10_bio->devs[n_copy].addr +
choose_data_offset(r10_bio, rdev));
mbio->bi_end_io = raid10_end_write_request;
- mbio->bi_opf = op | do_sync | do_fua;
+ mbio->bi_opf = op | do_sync | do_fua | do_atomic;
if (!replacement && test_bit(FailFast,
&conf->mirrors[devnum].rdev-
>flags)
&& enough(conf, devnum))
@@ -1468,7 +1469,15 @@ static void raid10_write_request(struct mddev
*mddev, struct bio *bio,
continue;
}
if (is_bad) {
- int good_sectors = first_bad -
dev_sector;
+ int good_sectors;
+
+ if (bio->bi_opf & REQ_ATOMIC) {
+ /* We just cannot atomically
write this ... */
Maybe mention that we can if there is at least one disk without any BB,
it's just benefit does not worth the complexity. And return the special
error code to let user retry without atomic write.
ok
+ error = -EFAULT;
Is EFAULT the right error code here? I think we should return something
covered by blk_errors?
sure, so maybe explicitly use BLK_STS_IOERR / EIO, which is what we
generally use in raid drivers when we cannot read/write - ok?
The error code is passed to bio by:
bio->bi_status = errno_to_blk_status(error);
So, this is fine.
Other than this, 4/5 and 5/5 look good to me.
Thanks,
John