Re: [PATCH rdma-rc 2/4] IB/mlx5: Retry cache population when resource is temporarily unavailable

Leon Romanovsky <leon@xxxxxxxxxx> · Tue, 27 Nov 2018 21:46:37 +0200

On Tue, Nov 27, 2018 at 12:24:47PM -0700, Jason Gunthorpe wrote:
> On Tue, Nov 27, 2018 at 08:37:27AM +0200, Leon Romanovsky wrote:
> > On Mon, Nov 26, 2018 at 08:49:00PM +0000, Jason Gunthorpe wrote:
> > > On Sun, Nov 25, 2018 at 08:34:24PM +0200, Leon Romanovsky wrote:
> > > > diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> > > > index 9b195d65a13e..f600623ce3f7 100644
> > > > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > > > @@ -480,7 +480,8 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, int entry)
> > > >  			if (err && err != -EAGAIN)
> > > >  				return ERR_PTR(err);
> > > >
> > > > -			wait_for_completion(&ent->compl);
> > > > +			wait_for_completion_timeout(&ent->compl,
> > > > +						    msecs_to_jiffies(20));
> > >
> > > I think we should revisit this when the threading here is fixed, as I
> > > remarked in the prior ODP patches.
> > >
> > > Adding a delay like this seems like it will result in overfilling the
> > > cache in some situations, and is incredibly ugly.
> >
> >
> > Jason,
> >
> > wait_for_completion_timeout() is not a delay but a way to limit wait if
> > completion is not arrived to something shorter than MAX_SCHEDULE_TIMEOUT,
> > which is ULONG_MAX. In some rare situations, it will lead to complete
> > halted system.
>
> No, it is being used as a delay here.
>
> If add_keys returns success then a completion should be guarenteed to
> be generated and no sleep should be performed.
>
> If it returns failure then no completion should be expected, and
> completion waiting should not be done. So this degrades into a screwy
> way to write msleep.
>
> The whole scheme around this is just wrong. Insertion and extraction
> from the cache should be more carefully controlled an cache extraction
> should never do something silly like return EAGAIN.
>
> If the number of mailbox requests is to be limited then that should be
> done directly, and the number of required MRs should be tracked so
> that more mailboxes can be issued when request space becomes
> available.
>
> This is the same problem the other patch was tip-toeing around. This
> code is wrongly designed and that is causing all these bugs.

Jason,

It doesn't matter how many bugs we have in this area and we wil
definitely invest time to rewrite this part of code, but it can't be
for -rc, due to proximity of the coming merge window and due to the
expected size of the change. Right now, you left the system to stuck
forever (ULONG_MAX).

Please take a look on the actual implementation of wait_for_completion
and wait_for_completion_timeout and see that we actually decreased
delay.

Thanks

>
> Jason
Attachment:
signature.asc

Description: PGP signature