On Tue, Jul 26, 2022 at 04:40:16PM -0500, Mike Christie wrote: > > On 7/20/22 6:28 AM, Dmitry Bogdanov wrote: > > Under huge load there is a possibility of race condition in updating > > se_dev_entry object in ACL removal procedure. > > > > NIP [c0080000154093d0] transport_lookup_cmd_lun+0x1f8/0x3d0 [target_core_mod] > > LR [c00800001542ab34] target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod] > > Call Trace: > > target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod] > > target_submit_cmd+0x44/0x60 [target_core_mod] > > tcm_qla2xxx_handle_cmd+0x88/0xe0 [tcm_qla2xxx] > > qlt_do_work+0x2e4/0x3d0 [qla2xxx] > > process_one_work+0x298/0x5c > > > > Despite usage of RCU primitives with deve->se_lun pointer, it has not > > became dereference-safe. Because deve->se_lun is updated not > > synchronized with a reader. That change might be in a release function > > called by synchronize_rcu(). But, in fact, there is no sence to NULL that > > pointer for deleting deve. So a better solution is to remove that NULLing. > > > Hey, > > For the line above about there being no reason to NULL the pointer are you > saying: > > > We should have had a NULL check like: > > transport_lookup_cmd_lun: > > ..... > > se_lun = rcu_dereference(deve->se_lun); > if (!se_lun) { > rcu_read_unlock(); > return TCM_NON_EXISTENT_LUN; > } > > and it would have prevented the crash. > > But we don't need that or the se_lun RCU use, because the hlist_del_rcu and > kfree_rcu calls in core_disable_device_list_for_node will make sure that > transport_lookup_cmd_lun either finds a se_dev_entry and while in use will > never be freed from under us, or transport_lookup_cmd_lun will never see a > se_dev_entry. > > > If that's what you meant then I agree. Yes, that is exactly how I thought. All access to deve->se_lun is already under rcu_read_lock. And either deve->se_lun is always valid or deve is not valid itself and will not be found in the list_for_*. Will add more details in the commit message.