Hi,
在 2023/09/22 6:03, Roman Mamedov 写道:
On Thu, 21 Sep 2023 17:45:24 -0400
Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
I just verified that 6.5.0 does have this DM core fix (needed to
prevent excessive splitting of discard IO.. which could cause fstrim
to take longer for a DM device), but again 6.5.0 has this fix so it
isn't relevant:
be04c14a1bd2 dm: use op specific max_sectors when splitting abnormal io
Given your use of 'writemostly' I'm inferring you're using lvm2's
raid1 that uses MD raid1 code in terms of the dm-raid target.
Discards (more generic term for fstrim) are considered writes, so
writemostly really shouldn't matter... but I know that there have been
issues with MD's writemostly code (identified by others relatively
recently).
All said: hopefully someone more MD oriented can review your report
and help you further.
Mike
I've reported that write-mostly TRIM gets split into 1MB pieces, which can be
an order of magnitude slower on some SSDs: https://www.spinics.net/lists/raid/msg72471.html
Looks like I missed the report.
Based on code review, it's very clearly where diskcard bio is splited:
raid1_write_request
for (i = 0; i < disks; i++)
if (rdev && test_bit(WriteMostly, &rdev->flags))
write_behind = true
if (write_behind && bitmap)
max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * (PAGE_SIZE >> 9))
// io size is 512 * (256 * (4k >> 9)) = 1M
if (max_sectors < bio_sectors(bio))
bio_split
Roman and Kirill, can you test the following patch?
Thanks,
Kuai
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4b30a1742162..4963f864ef99 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1345,6 +1345,7 @@ static void raid1_write_request(struct mddev
*mddev, struct bio *bio,
int first_clone;
int max_sectors;
bool write_behind = false;
+ bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
if (mddev_is_clustered(mddev) &&
md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1405,7 +1406,7 @@ static void raid1_write_request(struct mddev
*mddev, struct bio *bio,
* write-mostly, which means we could allocate write behind
* bio later.
*/
- if (rdev && test_bit(WriteMostly, &rdev->flags))
+ if (!is_discard && rdev && test_bit(WriteMostly,
&rdev->flags))
write_behind = true;
if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
Nobody cared to reply, investigate or fix.
Maybe your system hasn't frozen too, just taking its time in processing all
the tiny split requests.