Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 22, 2017 at 5:41 AM, Martin K. Petersen
<martin.petersen@xxxxxxxxxx> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryomov@xxxxxxxxx> writes:
>
> Ilya,
>
> Ilya> could you please explain blk_integrity_revalidate() and
> Ilya> its GENHD_FL_UP check in particular?  We have the queue,
> Ilya> bi->profile can't be NULL after blk_integrity_register(), and
> Ilya> since the latter "must" be used for registering the profile with
> Ilya> the block layer, wouldn't the following be sufficient for
> Ilya> blk_integrity users?
>
> IIrc, the FL_UP check fixed a registration problem in the nvme driver.

Did it have something to do with a NULL disk->queue?

>
> The rationale behind revalidate was that we need to handle devices which
> lose the integrity capability at runtime (i.e. a integrity-enabled DM
> device is extended with a non-cable drive forcing the feature to be
> turned off). The clearing of the integrity profile is more important in
> that case than zapping the stable pages flag. But that was the original
> reason for not just ORing BDI_CAP_STABLE_WRITES.
>
> I don't have a huge problem with keeping stable pages on if a device
> suddenly stops being integrity capable. However, I'd like to understand
> your use case a bit better.

Well, blk_integrity_revalidate() doesn't clear the profile, it just
clears the stable pages flag.  Whoever calls blk_integrity_unregister()
to clear the profile can also clear the stable pages flag -- why not
let blk_integrity_unregister() clear the flag like I suggested?

>
> Ilya> The alternative seems to be to set up a bogus
> Ilya> blk_integrity_profile (nop_profile won't do -- this one would have
> Ilya> to be truly bogus w/ NULL *_fn) under BLK_DEV_INTEGRITY ifdefs and
> Ilya> hope that nothing breaks.
>
> Can you point me to the relevant code on your end?

This code doesn't exist yet -- it's just something I thought I'd have
to do if my patch or something along those lines isn't accepted.  There
is the nop_profile with stub *_fn callbacks, which is used by nvme and
nvdimm to make bio_integrity_enabled() return true, so that per-interval
buffers are allocated in bio_integrity_prep().  "I want some space for
per-interval metadata, but no integrity checksums" is use case there.

In the rbd case, we don't want that.  We just want to set the stable
pages flag and make sure it's not reset by blk_integrity code.  So, if
blk_integrity_revalidate() isn't fixed, rbd will need to set up a bogus
profile with NULL *_fn callbacks to make bio_integrity_enabled() return
false.  This is obviously fragile, because blk_get_integrity() will no
longer return NULL...

We are setting BDI_CAP_STABLE_WRITES in rbd_init_disk(), see commit
bae818ee1577 ("rbd: require stable pages if message data CRCs are
enabled").  The CRCs are generated in write_partial_message_data() and
verified in read_partial_msg_data().

I'm attaching the patch I had in mind.

Thanks,

                Ilya
From 24a0cf8bc437a65b3986e9ab25cf2f05e6bbd5d0 Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov@xxxxxxxxx>
Date: Mon, 20 Feb 2017 18:50:50 +0100
Subject: [PATCH] block: get rid of blk_integrity_revalidate()

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which clears the stable pages
bit if no blk_integrity profile is registered.  It's called from
revalidate_disk() and rescan_partitions(), making it impossible to set
BDI_CAP_STABLE_WRITES for drivers that support partitions and don't use
blk_integrity.  One such driver is rbd, where the ceph messenger is
responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
---
 block/blk-integrity.c     | 19 ++-----------------
 block/partition-generic.c |  1 -
 fs/block_dev.c            |  1 -
 include/linux/genhd.h     |  2 --
 4 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d69c5c79f98e..319f2e4f4a8b 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -417,7 +417,7 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
 	bi->tuple_size = template->tuple_size;
 	bi->tag_size = template->tag_size;
 
-	blk_integrity_revalidate(disk);
+	disk->queue->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
 }
 EXPORT_SYMBOL(blk_integrity_register);
 
@@ -430,26 +430,11 @@ EXPORT_SYMBOL(blk_integrity_register);
  */
 void blk_integrity_unregister(struct gendisk *disk)
 {
-	blk_integrity_revalidate(disk);
+	disk->queue->backing_dev_info.capabilities &= ~BDI_CAP_STABLE_WRITES;
 	memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
 }
 EXPORT_SYMBOL(blk_integrity_unregister);
 
-void blk_integrity_revalidate(struct gendisk *disk)
-{
-	struct blk_integrity *bi = &disk->queue->integrity;
-
-	if (!(disk->flags & GENHD_FL_UP))
-		return;
-
-	if (bi->profile)
-		disk->queue->backing_dev_info.capabilities |=
-			BDI_CAP_STABLE_WRITES;
-	else
-		disk->queue->backing_dev_info.capabilities &=
-			~BDI_CAP_STABLE_WRITES;
-}
-
 void blk_integrity_add(struct gendisk *disk)
 {
 	if (kobject_init_and_add(&disk->integrity_kobj, &integrity_ktype,
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..0171a2faad68 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -497,7 +497,6 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 
 	if (disk->fops->revalidate_disk)
 		disk->fops->revalidate_disk(disk);
-	blk_integrity_revalidate(disk);
 	check_disk_size_change(disk, bdev);
 	bdev->bd_invalidated = 0;
 	if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3c47614a4b32..b94e2a4974a1 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1422,7 +1422,6 @@ int revalidate_disk(struct gendisk *disk)
 
 	if (disk->fops->revalidate_disk)
 		ret = disk->fops->revalidate_disk(disk);
-	blk_integrity_revalidate(disk);
 	bdev = bdget_disk(disk, 0);
 	if (!bdev)
 		return ret;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 76f39754e7b0..76d6a1cd4153 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -722,11 +722,9 @@ static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 extern void blk_integrity_add(struct gendisk *);
 extern void blk_integrity_del(struct gendisk *);
-extern void blk_integrity_revalidate(struct gendisk *);
 #else	/* CONFIG_BLK_DEV_INTEGRITY */
 static inline void blk_integrity_add(struct gendisk *disk) { }
 static inline void blk_integrity_del(struct gendisk *disk) { }
-static inline void blk_integrity_revalidate(struct gendisk *disk) { }
 #endif	/* CONFIG_BLK_DEV_INTEGRITY */
 
 #else /* CONFIG_BLOCK */
-- 
2.4.3


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux