On 3/26/24 06:53, Bart Van Assche wrote: > On 3/24/24 21:44, Damien Le Moal wrote: >> +/* >> + * Per-zone write plug. >> + */ >> +struct blk_zone_wplug { >> + struct hlist_node node; >> + struct list_head err; >> + atomic_t ref; >> + spinlock_t lock; >> + unsigned int flags; >> + unsigned int zone_no; >> + unsigned int wp_offset; >> + struct bio_list bio_list; >> + struct work_struct bio_work; >> +}; > > Please document what 'lock' protects. Please also document the unit of > wp_offset. > > Since there is an atomic reference count in this data structure, why is > the flag BLK_ZONE_WPLUG_FREEING required? Can that flag be replaced by > checking whether or not 'ref' is zero? Nope, we cannot. The reason is that BIO issuing and zone reset/finish can be concurrently processed and we need to be ready for a user doing really stupid things like resetting or finishing a zone while BIOs for that zone are being issued. When zone reset/finish is processed, the plug is removed from the hash table, but disk_get_zone_wplug_locked() may still get a reference to it because we do not have the plug locked yet. Hence the flag, to prevent reusing the plug for the reset/finished zone that was already removed from the hash table. This is mentioned with a comment in disk_get_zone_wplug_locked(): /* * Check that a BIO completion or a zone reset or finish * operation has not already flagged the zone write plug for * freeing and dropped its reference count. In such case, we * need to get a new plug so start over from the beginning. */ The reference count dropping to 0 will then be the trigger for actually freeing the plug, after all in-flight or plugged BIOs are completed (most likely failed). >> -void disk_free_zone_bitmaps(struct gendisk *disk) >> +static bool disk_insert_zone_wplug(struct gendisk *disk, >> + struct blk_zone_wplug *zwplug) >> +{ >> + struct blk_zone_wplug *zwplg; >> + unsigned long flags; >> + unsigned int idx = >> + hash_32(zwplug->zone_no, disk->zone_wplugs_hash_bits); >> + >> + /* >> + * Add the new zone write plug to the hash table, but carefully as we >> + * are racing with other submission context, so we may already have a >> + * zone write plug for the same zone. >> + */ >> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags); >> + hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) { >> + if (zwplg->zone_no == zwplug->zone_no) { >> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); >> + return false; >> + } >> + } >> + hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]); >> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); >> + >> + return true; >> +} > > Since this function inserts an element into disk->zone_wplugs_hash[], > can it happen that another thread removes that element from the hash > list before this function returns? No, that cannot happen. Both insertion and deletion of plugs in the hash table are serialized with disk->zone_wplugs_lock. See disk_remove_zone_wplug(). -- Damien Le Moal Western Digital Research