Re: [PATCH v2] target: core: remove from tmr_list at lun unlink

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 16, 2021 at 12:21:46PM +0300, Dmitry Bogdanov wrote:
> @@ -719,8 +726,16 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
>  	if (!lun)
>  		return;
>  
> +	target_remove_from_state_list(cmd);
> +	target_remove_from_tmr_list(cmd);
> +
>  	if (cmpxchg(&cmd->lun_ref_active, true, false))
>  		percpu_ref_put(&lun->lun_ref);
> +
> +	/*
> +	 * Clear struct se_cmd->se_lun before the handoff to FE.
> +	 */
> +	cmd->se_lun = NULL;
>  }

Sadly we just found out that this code is racing with
core_tmr_drain_tmr_list(). If LUN RESET comes in while there are still
some outstanding ABORT TASK functions left, the following sequence is
possible:

  1. During LUN RESET processing core_tmr_drain_tmr_list() is called
  2. During ABORT TASK processing transport_lun_remove_cmd() is called
     at the sime time
  3. core_tmr_drain_tmr_list() acquires &dev->se_tmr_lock lock and moves
     TMRs to the on-stack drain_tmr_list
  4. core_tmr_drain_tmr_list() releases &dev->se_tmr_lock and starts
     working on the drain_tmr_list
  5. At the same moment target_remove_from_tmr_list() is called
  6. It acquires &dev->se_tmr_lock, removes TMR from the list by
     list_del_init() and releases &dev->se_tmr_lock

What happens next is this:

  [  391.438244] LUN_RESET:  releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11
  [  391.438246] LUN_RESET:  releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11

The same TMR is being pulled out twice out of the drain_tmr_list. This
happens because there are no locks that prevent the list traversal in
core_tmr_drain_tmr_list() and the list element removal in
target_remove_from_tmr_list() from being executed concurrently. So
list_del_init() in target_remove_from_tmr_list() calls INIT_LIST_HEAD()
and tmr_p->next now points to tmr_p.

Hence the following warnings:

  [  391.438300] WARNING: CPU: 12 PID: 20064 at ../drivers/target/target_core_transport.c:2785
  ...
  [  391.438448] WARNING: CPU: 12 PID: 20064 at ../lib/refcount.c:28 refcount_warn_saturate+0x224/0x240

This issue also prevents other TMRs from being released, resulting in a
stuck session. Not always, since sometimes drain_tmr_list only contains
one element, but still possible.



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux