On 5/2/24 15:01, Hannes Reinecke wrote: >> +static inline void disk_zone_wplug_set_error(struct gendisk *disk, >> + struct blk_zone_wplug *zwplug) >> +{ >> + unsigned long flags; >> + >> + if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) >> + return; >> + > > I still get nervous when I see an unprotected flag being set. > Especially in code which is known to race with error handling. > Wouldn't it be better to check the flag under the lock or at > least use 'test_and_set_bit' here? It is protected: this is always called with the zone write plug spinlock being locked. > >> + /* >> + * At this point, we already have a reference on the zone write plug. >> + * However, since we are going to add the plug to the disk zone write >> + * plugs work list, increase its reference count. This reference will >> + * be dropped in disk_zone_wplugs_work() once the error state is >> + * handled, or in disk_zone_wplug_clear_error() if the zone is reset or >> + * finished. >> + */ >> + zwplug->flags |= BLK_ZONE_WPLUG_ERROR; > > And that is even worse. We might have been interrupted between these > two lines, invalidating the first check. Nope: zone write plug spinlock is locked. > > Please consider using 'test_and_set_bit()' here. > >> + atomic_inc(&zwplug->ref); >> + >> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags); >> + list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list); >> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); >> +} >> + >> +static inline void disk_zone_wplug_clear_error(struct gendisk *disk, >> + struct blk_zone_wplug *zwplug) >> +{ >> + unsigned long flags; >> + >> + if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR)) >> + return; >> + >> + /* >> + * We are racing with the error handling work which drops the reference >> + * on the zone write plug after handling the error state. So remove the >> + * plug from the error list and drop its reference count only if the >> + * error handling has not yet started, that is, if the zone write plug >> + * is still listed. >> + */ >> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags); >> + if (!list_empty(&zwplug->link)) { >> + list_del_init(&zwplug->link); >> + zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR; >> + disk_put_zone_wplug(zwplug); >> + } >> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); >> +} >> + > > Similar comments to above: you are clearing the flag under the lock, > but don't check or set under the lock... And similar comment: this is called with the zone write plug spinlock held. So no race with the flag handling. What is racy is the error handling because we can only hold disk->zone_wplugs_lock at first, and have to release that lock before we take the zone write plug spinlock (otherwise we would have lock order inversion). And the error hanlding needs to do a report zone, so no zone write plug spinlock either, and in the meantime, the user may do a zone reset or reset all... Hence the trickery here to look at if the error handling work already took the plug out of the list for processing or not. If it has, then the error handling will do what is needed with the error flag. -- Damien Le Moal Western Digital Research