On Fri, May 31, 2024 at 10:21:39AM +0530, Anand Khoje wrote: > > On 5/30/24 22:44, Leon Romanovsky wrote: > > On Wed, May 22, 2024 at 09:02:56AM +0530, Anand Khoje wrote: > > > In non FLR context, at times CX-5 requests release of ~8 million device pages. > > > This needs humongous number of cmd mailboxes, which to be released once > > > the pages are reclaimed. Release of humongous number of cmd mailboxes > > > consuming cpu time running into many secs, with non preemptable kernels > > > is leading to critical process starving on that cpu’s RQ. To alleviate > > > this, this patch relinquishes cpu periodically but conditionally. > > > > > > Orabug: 36275016 > > > > > > Signed-off-by: Anand Khoje <anand.a.khoje@xxxxxxxxxx> > > > --- > > > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c > > > index 9c21bce..9fbf25d 100644 > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c > > > @@ -1336,16 +1336,23 @@ static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev *dev, > > > return ERR_PTR(err); > > > } > > > +#define RESCHED_MSEC 2 > > > static void mlx5_free_cmd_msg(struct mlx5_core_dev *dev, > > > struct mlx5_cmd_msg *msg) > > > { > > > struct mlx5_cmd_mailbox *head = msg->next; > > > struct mlx5_cmd_mailbox *next; > > > + unsigned long start_time = jiffies; > > > while (head) { > > > next = head->next; > > > free_cmd_box(dev, head); > > Did you consider to make this function asynchronous and parallel? > > > > Thanks > > Hi Leon, > > Thanks for reviewing this patch. > > Here, all page related methods give_pages/reclaim_pages/release_all_pages > are executed in a worker thread through pages_work_handler(). > > Doesn't that mean it is already asynchronous? You didn't provide any performance data, so I can't say if it is related to work_handlers. For example, we can be in this loop when we call to mlx5_cmd_disable() and it will cause to synchronous calls to dma_pool_free() which holds the spinlock. Also pages_work_handler() runs through single threaded workqueue, it is not asynchronous. > > When the worker thread, in this case it is processing reclaim_pages(), is > taking a long time - it is starving other processes on the processor that it > is running on. Oracle UEK being a non-preemptible kernel, these other > processes that are getting starved do not get CPU until the worker > relinquishes the CPU. This applies to even processes that are time critical > and high priority. These processes when starved of CPU for a long time, > trigger a kernel panic. Please add kernel panic and perf data to your commit message. > > Hence, this patch implements a time based relinquish of CPU using > cond_resched(). > > Shay Dori, had a suggestion to tune the time (which we have made 2 msec), to > reduce too frequent context switching and find a balance in processing of > these mailbox objects. I am presently running some tests on the basis of > this suggestion. You will have better results if you parallel page release. Thanks > > Thanks, > > Anand > > > > head = next; > > > + if (time_after(jiffies, start_time + msecs_to_jiffies(RESCHED_MSEC))) { > > > + mlx5_core_warn_rl(dev, "Spent more than %d msecs, yielding CPU\n", RESCHED_MSEC); > > > + cond_resched(); > > > + start_time = jiffies; > > > + } > > > } > > > kfree(msg); > > > } > > > -- > > > 1.8.3.1 > > > > > >