Re: [PATCH RESEND] scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock

Hannes Reinecke <hare@xxxxxxx> · Wed, 16 Aug 2023 19:10:03 +0200

On 8/16/23 17:55, Chengfeng Ye wrote:
There is a long call chain that &fip->ctlr_lock is acquired by isr
fnic_isr_msix_wq_copy() under hard irq context. Thus other process
context code acquiring the lock should disable irq, otherwise
deadlock could happen if the irq preempt the execution while the
lock is held in process context on the same CPU.

[ISR]
fnic_isr_msix_wq_copy()
  -> fnic_wq_copy_cmpl_handler()
  -> fnic_fcpio_cmpl_handler()
  -> fnic_fcpio_flogi_reg_cmpl_handler()
  -> fnic_flush_tx()
  -> fnic_send_frame()
  -> fcoe_ctlr_els_send()
  -> spin_lock_bh(&fip->ctlr_lock)

[Process Context]
1. fcoe_ctlr_timer_work()
  -> fcoe_ctlr_flogi_send()
  -> spin_lock_bh(&fip->ctlr_lock)

2. fcoe_ctlr_recv_work()
  -> fcoe_ctlr_recv_handler()
  -> fcoe_ctlr_recv_els()
  -> fcoe_ctlr_announce()
  -> spin_lock_bh(&fip->ctlr_lock)

3. fcoe_ctlr_recv_work()
  -> fcoe_ctlr_recv_handler()
  -> fcoe_ctlr_recv_els()
  -> fcoe_ctlr_flogi_retry()
  -> spin_lock_bh(&fip->ctlr_lock)

4. -> fcoe_xmit()
  -> fcoe_ctlr_els_send()
  -> spin_lock_bh(&fip->ctlr_lock)

spin_lock_bh() is not enough since fnic_isr_msix_wq_copy() is a
hardirq.

These flaws were found by an experimental static analysis tool I am
developing for irq-related deadlock.

The patch fix the potential deadlocks by spin_lock_irqsave() to
disable hard irq.

Signed-off-by: Chengfeng Ye <dg573847474@xxxxxxxxx>
---
  drivers/scsi/fcoe/fcoe_ctlr.c | 20 ++++++++++++--------
  1 file changed, 12 insertions(+), 8 deletions(-)

Reviewed-by: Hannes Reinecke <hare@xxxxxxx>

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman