Re: [PATCH v2 5/6] dm: fix dm_blk_report_zones

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Tue, 18 Mar 2025 18:10:01 -0400

On Tue, Mar 18, 2025 at 07:57:21AM +0100, Christoph Hellwig wrote:
> On Mon, Mar 17, 2025 at 12:45:09AM -0400, Benjamin Marzinski wrote:
> > __bind(). Otherwise the zone resources could change while accessing
> > them. Finally, it is possible that md->zone_revalidate_map will change
> > while calling this function. Only read it once, so that we are always
> > using the same value. Otherwise we might miss a call to
> > dm_put_live_table().
> 
> This checking for calling context is pretty ugly.  Can you just use
> a separate report zones helper or at least a clearly documented flag for it?

Not sure how that would work. The goal is to keep another process,
calling something like blkdev_report_zones_ioctl(), from being able to
call its report_zones_cb, while DM is in blk_revalidate_disk_zones()
which needs to use that same disk->fops->report_zones() interface in
order to call blk_revalidate_zone_cb(). We need some way to distinguish
between the callers. We could export blk_revalidate_zone_cb() and have
dm_blk_report_zones() check if it was called with that report_zones_cb.
That seems just as ugly to me. But if you like that better, fine.
Otherwise I don't see how we can distinguish between the call from
blk_revalidate_disk_zones() and a call from something like
blkdev_report_zones_ioctl(). Am I not understanding your suggestion?

Allowing the blkdev_report_zones_ioctl() call to go ahead using
md->zone_revalidate_map runs into three problems.

1. It's running while the device is switching tables, and there's no
guarantee that DM won't encounter an error and have to fail back. So it
could report information for this unused table. We could just say that
this is what you get from trying to grab the zone information of a
device while it's switching tables. Fine.

2. Without this patch, it's possible that while
blkdev_report_zones_ioctl() is still running its report_zones_cb, DM
fails switching tables and frees the new table that's being used by
blkdev_report_zones_ioctl(), causing use-after-free errors. However,
this is solvable by calling srcu_read_lock(&md->io_barrier) to make sure
that we're in a SRCU read section while using md->zone_revalidate_map.
Making that chage should make DM as safe as any other zoned device,
which brings me to...

3. On any zoned device, not just DM, what's stopping
one process from syncing the zone write plug offsets:
blkdev_report_zones_ioctl() -> blkdev_report_zones() ->
various.report_zones() -> disk_report_zones_cb() ->
disk_zone_wplug_sync_wp_offset()

While another process is running into problems while dealing with the
gendisk's zone configuration changing:

blk_revalidate_disk_zones() -> disk_free_zone_resources()

I don't see any synchronizing between these two call paths. Am I missing
something that stops this? Can this only happen for DM devices, for some
reason? If this can't happen, I'm fine with just adding a
srcu_read_lock() to dm_blk_report_zones() and calling it good.  If it
can happen, then we should fix that.

-Ben