Re: [PATCH rdma-rc 2/4] IB/mlx5: Retry cache population when resource is temporarily unavailable

Jason Gunthorpe <jgg@xxxxxxxx> · Tue, 27 Nov 2018 12:55:16 -0700

On Tue, Nov 27, 2018 at 09:46:37PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 27, 2018 at 12:24:47PM -0700, Jason Gunthorpe wrote:
> > On Tue, Nov 27, 2018 at 08:37:27AM +0200, Leon Romanovsky wrote:
> > > On Mon, Nov 26, 2018 at 08:49:00PM +0000, Jason Gunthorpe wrote:
> > > > On Sun, Nov 25, 2018 at 08:34:24PM +0200, Leon Romanovsky wrote:
> > > > > diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> > > > > index 9b195d65a13e..f600623ce3f7 100644
> > > > > +++ b/drivers/infiniband/hw/mlx5/mr.c
> > > > > @@ -480,7 +480,8 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, int entry)
> > > > >  			if (err && err != -EAGAIN)
> > > > >  				return ERR_PTR(err);
> > > > >
> > > > > -			wait_for_completion(&ent->compl);
> > > > > +			wait_for_completion_timeout(&ent->compl,
> > > > > +						    msecs_to_jiffies(20));
> > > >
> > > > I think we should revisit this when the threading here is fixed, as I
> > > > remarked in the prior ODP patches.
> > > >
> > > > Adding a delay like this seems like it will result in overfilling the
> > > > cache in some situations, and is incredibly ugly.
> > >
> > >
> > > Jason,
> > >
> > > wait_for_completion_timeout() is not a delay but a way to limit wait if
> > > completion is not arrived to something shorter than MAX_SCHEDULE_TIMEOUT,
> > > which is ULONG_MAX. In some rare situations, it will lead to complete
> > > halted system.
> >
> > No, it is being used as a delay here.
> >
> > If add_keys returns success then a completion should be guarenteed to
> > be generated and no sleep should be performed.
> >
> > If it returns failure then no completion should be expected, and
> > completion waiting should not be done. So this degrades into a screwy
> > way to write msleep.
> >
> > The whole scheme around this is just wrong. Insertion and extraction
> > from the cache should be more carefully controlled an cache extraction
> > should never do something silly like return EAGAIN.
> >
> > If the number of mailbox requests is to be limited then that should be
> > done directly, and the number of required MRs should be tracked so
> > that more mailboxes can be issued when request space becomes
> > available.
> >
> > This is the same problem the other patch was tip-toeing around. This
> > code is wrongly designed and that is causing all these bugs.
> 
> Jason,
> 
> It doesn't matter how many bugs we have in this area and we wil
> definitely invest time to rewrite this part of code, but it can't be
> for -rc, due to proximity of the coming merge window and due to the
> expected size of the change. Right now, you left the system to stuck
> forever (ULONG_MAX).

The proper trivial fix is to not call wait_for_completion if a
completion is not expected, ie look for the EGAIN path.

Otherwise this causes false failures of things that *were* guarenteed
to get completions and that causes a different sort of bug.

Jason