On Tue, Jul 27, 2021 at 07:44:52AM -0500, Bob Pearson wrote: > On 7/27/21 7:02 AM, Leon Romanovsky wrote: > > On Tue, Jul 27, 2021 at 06:31:59AM -0500, Bob Pearson wrote: > >> On 7/27/21 6:30 AM, Leon Romanovsky wrote: > >>> On Mon, Jul 26, 2021 at 04:58:16PM -0500, Bob Pearson wrote: > >>>> Currently several rxe objects hold references to PDs which are ref- > >>>> counted. This replicates work already done by RDMA core which takes > >>>> references to PDs in the ib objects which are contained in the rxe > >>>> objects. This patch removes struct rxe_pd from rxe objects and removes > >>>> reference counting for PDs except for PD alloc and PD dealloc. It also > >>>> adds inline extractor routines which return PDs from the PDs in the > >>>> ib objects. The names of these are made consistent including a rxe_ > >>>> prefix. > >>>> > >>>> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > >>>> --- > >>>> drivers/infiniband/sw/rxe/rxe_comp.c | 4 ++-- > >>>> drivers/infiniband/sw/rxe/rxe_loc.h | 4 ++-- > >>>> drivers/infiniband/sw/rxe/rxe_mr.c | 8 +++---- > >>>> drivers/infiniband/sw/rxe/rxe_mw.c | 31 +++++++++++---------------- > >>>> drivers/infiniband/sw/rxe/rxe_qp.c | 9 +------- > >>>> drivers/infiniband/sw/rxe/rxe_req.c | 2 +- > >>>> drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++-- > >>>> drivers/infiniband/sw/rxe/rxe_verbs.c | 26 ++++++---------------- > >>>> drivers/infiniband/sw/rxe/rxe_verbs.h | 24 +++++++++++++++------ > >>>> 9 files changed, 48 insertions(+), 64 deletions(-) > >>> > >>> Last time when I looked on it, I came to conclusion that all RXE > >>> references can be dropped. > >>> > >>> Thanks > >>> > >> This is a step in that direction. There are more coming. > > > > Glad to hear, thank you for your work. > > > >> Regards, > >> > >> Bob > > The other ones I can immediately get rid of are AHs, CQs and SRQs. > > The ones I think may require ref counting are MRs, MWs, QPs, and XRCSRQs. Each of these > get looked up from information in RoCE packets from rkeys, qpns, and srq_nums, For reliable transports > on slow networks this can require that these objects hang around for a while and the user has no > visibility to this unless there is a completion event waiting but not for e.g. reads, writes or atomics on the target side. There can be races between destroying these objects and messages completing causing > kernel oops. Do you know another way to address these cases? IMHO, everything that was converted to general allocation scheme is safe to drop RXE internal counting. Thanks > > Bob