Re: [PATCH] target: iscsi: fix hard lockup when executing a compare-and-write command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/21/22 3:27 AM, Maurizio Lombardi wrote:
> While handling an I/O completion for the compare portion of a
> COMPARE_AND_WRITE command, it may happen that the
> compare_and_write_callback function submits new bio structs
> while still in softirq context.
> 
> low level drivers like md raid5 do not expect their make_request
> call to be used in softirq context, they call into schedule() and
> create a deadlocked system.
> 
>  __schedule at ffffffff873a0807
>  schedule at ffffffff873a0cc5
>  raid5_get_active_stripe at ffffffffc0875744 [raid456]
>  raid5_make_request at ffffffffc0875a50 [raid456]
>  md_handle_request at ffffffff8713b9f9
>  md_make_request at ffffffff8713bacb
>  generic_make_request at ffffffff86e6f14b
>  submit_bio at ffffffff86e6f27c
>  iblock_submit_bios at ffffffffc0b4e4dc [target_core_iblock]
>  iblock_execute_rw at ffffffffc0b4f3ce [target_core_iblock]
>  __target_execute_cmd at ffffffffc1090079 [target_core_mod]
>  compare_and_write_callback at ffffffffc1093602 [target_core_mod]
>  target_cmd_interrupted at ffffffffc108d1ec [target_core_mod]
>  target_complete_cmd_with_sense at ffffffffc108d27c [target_core_mod]
>  iblock_complete_cmd at ffffffffc0b4e23a [target_core_iblock]
>  dm_io_dec_pending at ffffffffc00db29e [dm_mod]
>  clone_endio at ffffffffc00dbf07 [dm_mod]
>  raid5_align_endio at ffffffffc086d6c2 [raid456]
>  blk_update_request at ffffffff86e6d950
>  scsi_end_request at ffffffff87063d48
>  scsi_io_completion at ffffffff87063ee8
>  blk_complete_reqs at ffffffff86e77b05
>  __softirqentry_text_start at ffffffff876000d7
> 
> This problem appears to be an issue between target_cmd_interrupted()
> and compare_and_write_callback(). target_cmd_interrupted() calls the
> se_cmd's transport_complete_callback function pointer if the se_cmd
> is being stopped or aborted, and CMD_T_ABORTED was set on the se_cmd.
> 
> When calling compare_and_write_callback(), the success parameter
> was set to false. target_cmd_interrupted() seems to expect this
> means the callback will do cleanup that does not require a process
> context. But compare_and_write_callback() ignores the parameter if
> there was I/O done for the compare part of COMPARE_AND_WRITE.
> 
> Since there was data, the function continued on, passed the compare,
> and issued a write while ignoring the value of the success parameter.
> The submit of a bio for the write portion of the COMPARE_AND_WRITE
> then causes schedule to be unsafely called from the softirq context.
> 
> Fix the bug in compare_and_write_callback by jumping
> to the out label if success == "false",  after checking if
> we have been called by transport_generic_request_failure();
> The command is being aborted or stopped so there is no
> need to submit the write bio for the write part
> of the COMPARE_AND_WRITE command.
> 
> Signed-off-by: Maurizio Lombardi <mlombard@xxxxxxxxxx>
> ---
>  drivers/target/target_core_sbc.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
> index 1e3216de1e04..80d7a4419c4c 100644
> --- a/drivers/target/target_core_sbc.c
> +++ b/drivers/target/target_core_sbc.c
> @@ -454,12 +454,22 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd, bool succes
>  	sense_reason_t ret = TCM_NO_SENSE;
>  	int i;
>  
> -	/*
> -	 * Handle early failure in transport_generic_request_failure(),
> -	 * which will not have taken ->caw_sem yet..
> -	 */
> -	if (!success && (!cmd->t_data_sg || !cmd->t_bidi_data_sg))
> -		return TCM_NO_SENSE;
> +	if (!success) {
> +		/*
> +		 * Handle early failure in transport_generic_request_failure(),
> +		 * which will not have taken ->caw_sem yet..
> +		 */
> +		if (!cmd->t_data_sg || !cmd->t_bidi_data_sg)
> +			return TCM_NO_SENSE;
> +
> +		/*
> +		 * The command has been stopped or aborted so
> +		 * we don't have to perform the write operation.
> +		 */
> +		WARN_ON(!(cmd->transport_state &
> +			(CMD_T_ABORTED | CMD_T_STOP)));
> +		goto out;
> +	}

Instead of having the "bool success" arg then the callback figuring out the
context/state it's being called from, would it be nicer to have the caller
tell us. Change the bool to a:

enum target_callback_state {
	TARGET_CALLBACK_STATE_SUCCESS,
	TARGET_CALLBACK_STATE_ABORTED,
	TARGET_CALLBACK_STATE_FAILED,
};

?



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux