On 3/19/25 7:10 AM, Benjamin Marzinski wrote: > On Tue, Mar 18, 2025 at 07:57:21AM +0100, Christoph Hellwig wrote: >> On Mon, Mar 17, 2025 at 12:45:09AM -0400, Benjamin Marzinski wrote: >>> __bind(). Otherwise the zone resources could change while accessing >>> them. Finally, it is possible that md->zone_revalidate_map will change >>> while calling this function. Only read it once, so that we are always >>> using the same value. Otherwise we might miss a call to >>> dm_put_live_table(). >> >> This checking for calling context is pretty ugly. Can you just use >> a separate report zones helper or at least a clearly documented flag for it? > > Not sure how that would work. The goal is to keep another process, > calling something like blkdev_report_zones_ioctl(), from being able to > call its report_zones_cb, while DM is in blk_revalidate_disk_zones() > which needs to use that same disk->fops->report_zones() interface in > order to call blk_revalidate_zone_cb(). We need some way to distinguish > between the callers. We could export blk_revalidate_zone_cb() and have > dm_blk_report_zones() check if it was called with that report_zones_cb. > That seems just as ugly to me. But if you like that better, fine. > Otherwise I don't see how we can distinguish between the call from > blk_revalidate_disk_zones() and a call from something like > blkdev_report_zones_ioctl(). Am I not understanding your suggestion? > > Allowing the blkdev_report_zones_ioctl() call to go ahead using > md->zone_revalidate_map runs into three problems. > > 1. It's running while the device is switching tables, and there's no > guarantee that DM won't encounter an error and have to fail back. So it > could report information for this unused table. We could just say that > this is what you get from trying to grab the zone information of a > device while it's switching tables. Fine. > > 2. Without this patch, it's possible that while > blkdev_report_zones_ioctl() is still running its report_zones_cb, DM > fails switching tables and frees the new table that's being used by > blkdev_report_zones_ioctl(), causing use-after-free errors. However, > this is solvable by calling srcu_read_lock(&md->io_barrier) to make sure > that we're in a SRCU read section while using md->zone_revalidate_map. > Making that chage should make DM as safe as any other zoned device, > which brings me to... > > 3. On any zoned device, not just DM, what's stopping > one process from syncing the zone write plug offsets: > blkdev_report_zones_ioctl() -> blkdev_report_zones() -> > various.report_zones() -> disk_report_zones_cb() -> > disk_zone_wplug_sync_wp_offset() disk_zone_wplug_sync_wp_offset() is a nop for any zone write plug that does not have BLK_ZONE_WPLUG_NEED_WP_UPDATE set. And that flag is set only for zones that had a write error. Given that recovering from write errors on zoned device requires to: 1) stop writing to the zone, 2) Do a report zone (which will sync the wp offset) 3) Restart writing there is no write IOs going on for the zone that is being reported, for a well behaved user. If the user is not well behaved, it will get errors/out of sync wp, until the user behaves :) So I would not worry too much about this. As we discussed, the table switching should be allowed only for the wildcard target (dm-error) and that's it. We should not allow table switching otherwise. And given that inserting dm-error does not cause any change to the zone configuration, I do not see why we need to revalidate zones. > > While another process is running into problems while dealing with the > gendisk's zone configuration changing: > > blk_revalidate_disk_zones() -> disk_free_zone_resources() We should not allow the zone configuration to change. That is impossible to deal with at the user level. > I don't see any synchronizing between these two call paths. Am I missing > something that stops this? Can this only happen for DM devices, for some > reason? If this can't happen, I'm fine with just adding a > srcu_read_lock() to dm_blk_report_zones() and calling it good. If it > can happen, then we should fix that. I think it can happen today but no-one ever had this issue because no-one has tried to switch a dm-table on a zoned drive. And I really think we should prevent that from being done, except for dm-error since that's used for fstests. > > -Ben > -- Damien Le Moal Western Digital Research