On 25/07/2022 10:16, Guoqing Jiang wrote: > > > On 7/25/22 9:13 AM, lizhijian@xxxxxxxxxxx wrote: >> On 22/07/2022 06:19, Bob Pearson wrote: >>> On 7/20/22 05:50, Haris Iqbal wrote: >>>> On Wed, Jul 20, 2022 at 12:22 PM Li Zhijian<lizhijian@xxxxxxxxxxx> wrote: >>>>> Below 2 commits will be reverted: >>>>> 8ff5f5d9d8cf ("RDMA/rxe: Prevent double freeing rxe_map_set()") >>>>> 647bf13ce944 ("RDMA/rxe: Create duplicate mapping tables for FMRs") >>>>> >>>>> The community has a few bug reports which pointed this commit at last. >>>>> Some proposals are raised up in the meantime but all of them have no >>>>> follow-up operation. >>>>> >>>>> The previous commit led the map_set of FMR to be not avaliable any more if >>>>> the MR is registered again after invalidating. Although the mentioned >>>>> patch try to fix a potential race in building/accessing the same table >>>>> for fast memory regions, it broke rnbd etc ULPs. Since the latter could >>>>> be worse, revert this patch. >>>>> >>>>> With previous commit, it's observed that a same MR in rnbd server will >>>>> trigger below code path: >>>> Looks Good. I tested the patch against rdma for-next and it solves the >>>> problem mentioned in the commit. >>>> One small nitpick. It should be rtrs, and not rnbd in the commit message. >>>> >>>> Feel free to add my, >>>> >>>> Tested-by: Md Haris Iqbal<haris.iqbal@xxxxxxxxx> >>>> >>>>> -> rxe_mr_init_fast() >>>>> |-> alloc map_set() # map_set is uninitialized >>>>> |...-> rxe_map_mr_sg() # build the map_set >>>>> |-> rxe_mr_set_page() >>>>> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means >>>>> # we can access host memory(such rxe_mr_copy) >>>>> |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID >>>>> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE, >>>>> # but map_set was not built again >>>>> |...-> rxe_mr_copy() # kernel crash due to access wild addresses >>>>> # that lookup from the map_set >>>>> >>> Where is the use case for this? All the FMR examples I am aware of call rxe_map_mr_sg() >>> between each reg_fast_mr/invalidate_mr() sequence. I am not familiar with rtrs. >>> What is it? >> it would happen when we are creating a rnbd connection. > > To be accurate, it is rtrs connection. Thanks for the reminder again. > >> modprobe rnbd_server >> modprobe rnbd_client >> >> echo "sessname=foo path=ip:<server-ip> device_path=/dev/nvme0n1" > /sys/devices/virtual/rnbd-client/ctl/map_device >> >> >> I have tested blktests and above rnbd case, they works fine. >> Further more, your "[PATCH RFC] RDMA/rxe: Allow re-registration of FMRs" does'n fix the above rnbd use case. > > Thanks for the effort! I believe rnbd/rtrs over rxe had been broken for a while, can we agree > the problem need to be fixed? > Sure, obviously it's a regression for rnbd/rtrs. But we didn't pay much attention on it before. > Thanks, > Guoqing