On Thu, Apr 8, 2021 at 3:50 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Thu, Apr 08, 2021 at 03:45:45PM +0200, Jinpu Wang wrote: > > On Thu, Apr 8, 2021 at 2:04 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > > On Tue, Apr 06, 2021 at 10:55:59AM +0200, Gioh Kim wrote: > > > > On Thu, Apr 1, 2021 at 8:44 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > > > > > > On Thu, Mar 25, 2021 at 04:32:58PM +0100, Gioh Kim wrote: > > > > > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > > index 42f49208b8f7..1519191d7154 100644 > > > > > > +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > > > > > @@ -808,6 +808,9 @@ static struct rtrs_clt_sess *get_next_path_min_inflight(struct path_it *it) > > > > > > int inflight; > > > > > > > > > > > > list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) { > > > > > > + if (unlikely(READ_ONCE(sess->state) != RTRS_CLT_CONNECTED)) > > > > > > + continue; > > > > > > > > > > There is no way this could be right, a READ_ONCE can't guarentee that > > > > > a following load is not going to happen without races. > > > > > > > > > > You need locking. > > > > > > > > Hi Jason, > > > > > > > > rtrs_clt_request() calls rcu_read_lock() before calling > > > > get_next_path_min_inflight(). > > > > And rtrs_clt_change_state_from_to(), that changes the sess->state, > > > > calls spin_lock_irq() before changing it. > > > > I think that is enough, isn't it? > > > > > > Why would that be enough? > > > > > > Under RCU this check is racy and effetively does nothing. > > > > > > This is an OK usage of RCU: > > > > > > list_del_rcu(&sess->s.entry); > > > > > > /* Make sure everybody observes path removal. */ > > > synchronize_rcu(); > > > > > > And you could say that observing the sess in the list is required, but > > > checking state is pointless. > > > > > > Jason > > Hi Jason > > > > Sending IO via disconnected session is not a problem, it will just get > > an error. It's only about if in the meantime user delete the path > > while IO running, and session is freed. > > But the session can toggle to !connected immediately after the test > above as well, so I still don't see what this is accomplishing. > > Jason The original problem was - get_next_path_min_inflight() checks sess->state == CONNECTED - rtrs_clt_remove_path_from_sysfs() set sess->state = RTRS_CLT_DEAD - rtrs_clt_remove_path_from_sysfs() -> rtrs_clt_destroy_sess_files() -> kobject_put(sess->stats->kobj_stats) -> free sess->stats - get_next_path_min_inflight() read sess->stats->inflight So the patch adds checking sess->state in get_next_path_min_inflight(). I think the same problem happens after rtrs_clt_request(). - rtrs_clt_request() checks sess->state == CONNECTED - rtrs_clt_remove_path_from_sysfs() set sess->state = RTRS_CLT_DEAD - rtrs_clt_remove_path_from_sysfs() -> rtrs_clt_destroy_sess_files() -> kobject_put(sess->stats->kobj_stats) -> free sess->stats - rtrs_clt_request()->rtrs_clt_read_req() reads sess->stats->inflight I think it might be a solution to change the rtrs_clt_remove_path_from_sysfs as below. It changes the order: first remove the session from list and then destroy sess->stat memory. @@ -2900,8 +2900,8 @@ int rtrs_clt_remove_path_from_sysfs(struct rtrs_clt_sess *sess, } while (!changed && old_state != RTRS_CLT_DEAD); if (likely(changed)) { - rtrs_clt_destroy_sess_files(sess, sysfs_self); rtrs_clt_remove_path_from_arr(sess); + rtrs_clt_destroy_sess_files(sess, sysfs_self); kobject_put(&sess->kobj); }