Re: [PATCH 1/2] target: core: fix race during ACL removal

Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> · Wed, 27 Jul 2022 09:58:42 +0300



On Tue, Jul 26, 2022 at 04:40:16PM -0500, Mike Christie wrote:
> 
> On 7/20/22 6:28 AM, Dmitry Bogdanov wrote:
> > Under huge load there is a possibility of race condition in updating
> > se_dev_entry object in ACL removal procedure.
> >
> >  NIP [c0080000154093d0] transport_lookup_cmd_lun+0x1f8/0x3d0 [target_core_mod]
> >  LR [c00800001542ab34] target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
> >  Call Trace:
> >    target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
> >    target_submit_cmd+0x44/0x60 [target_core_mod]
> >    tcm_qla2xxx_handle_cmd+0x88/0xe0 [tcm_qla2xxx]
> >    qlt_do_work+0x2e4/0x3d0 [qla2xxx]
> >    process_one_work+0x298/0x5c
> >
> > Despite usage of RCU primitives with deve->se_lun pointer, it has not
> > became dereference-safe. Because deve->se_lun is updated not
> > synchronized with a reader. That change might be in a release function
> > called by synchronize_rcu(). But, in fact, there is no sence to NULL that
> > pointer for deleting deve. So a better solution is to remove that NULLing.
> >
> Hey,
> 
> For the line above about there being no reason to NULL the pointer are you
> saying:
> 
> 
> We should have had a NULL check like:
> 
> transport_lookup_cmd_lun:
> 
> .....
> 
> se_lun = rcu_dereference(deve->se_lun);
> if (!se_lun) {
>         rcu_read_unlock();
>         return TCM_NON_EXISTENT_LUN;
> }
> 
> and it would have prevented the crash.
> 
> But we don't need that or the se_lun RCU use, because the hlist_del_rcu and
> kfree_rcu calls in core_disable_device_list_for_node will make sure that
> transport_lookup_cmd_lun either finds a se_dev_entry and while in use will
> never be freed from under us, or transport_lookup_cmd_lun will never see a
> se_dev_entry.
> 
> 
> If that's what you meant then I agree.
Yes, that is exactly how I thought. All access to deve->se_lun is
already under rcu_read_lock. And either deve->se_lun is always valid or
deve is not valid itself and will not be found in the list_for_*.
Will add more details in the commit message.