From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> When transport_clear_lun_ref() is shutting down a se_lun via configfs with new I/O in-flight, it's possible to trigger a NULL pointer dereference in transport_lookup_cmd_lun() due to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD checking before incrementing lun->lun_ref.count after lun->lun_ref has switched to atomic_t mode. This results in a NULL pointer dereference as LUN shutdown code in core_tpg_remove_lun() continues running after the existing ->release() -> core_tpg_lun_ref_release() callback completes, and clears the RCU protected se_lun->lun_se_dev pointer. During the OOPs, the state of lun->lun_ref in the process which triggered the NULL pointer dereference looks like the following on v4.1.y stable code: struct se_lun { lun_link_magic = 4294932337, lun_status = TRANSPORT_LUN_STATUS_FREE, ..... lun_se_dev = 0x0, lun_sep = 0x0, ..... lun_ref = { count = { counter = 1 }, percpu_count_ptr = 3, release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>, confirm_switch = 0x0, force_atomic = false, rcu = { next = 0xffff88154fa1a5d0, func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu> } } } To address this bug, use percpu_ref_tryget_live() to ensure once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref has switched to atomic_t, all new I/Os will fail to obtain a new lun->lun_ref reference. Also use an explicit percpu_ref_kill_and_confirm() callback to block on ->lun_ref_comp to allow the first stage and associated RCU grace period to complete, and then block on ->lun_ref_shutdown waiting for the final percpu_ref_put() to drop the last reference via transport_lun_remove_cmd() before continuing with core_tpg_remove_lun() shutdown. Reported-by: Rob Millner <rlm@xxxxxxxxxxxxx> Tested-by: Rob Millner <rlm@xxxxxxxxxxxxx> Cc: Rob Millner <rlm@xxxxxxxxxxxxx> Tested-by: Vaibhav Tandon <vst@xxxxxxxxx> Cc: Vaibhav Tandon <vst@xxxxxxxxx> Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> --- drivers/target/target_core_device.c | 10 ++++++++-- drivers/target/target_core_tpg.c | 3 ++- drivers/target/target_core_transport.c | 31 ++++++++++++++++++++++++++++++- include/target/target_core_base.h | 1 + 4 files changed, 41 insertions(+), 4 deletions(-)
I have seen this and have tested this with our custom kernel. So this looks good from me! -Bryant