On Thu, Oct 20, 2022 at 01:24:54AM -0700, Aru wrote: > On 10/18/22 12:47 AM, Leon Romanovsky wrote: > > On Fri, Oct 14, 2022 at 12:12:36PM -0700, Aru wrote: > > > Hi Leon, > > > > > > Thank you for reviewing the patch. > > > > > > The method you mentioned disables the dump permanently for the kernel. > > > We thought vendor might have enabled it for their consumption when needed. > > > Hence we made it dynamic, so that it can be enabled/disabled at run time. > > > > > > Especially, in a production environment, having the option to turn this log > > > on/off > > > at runtime will be helpful. > > While you are interested on/off this specific warning, your change will > > cause "to hide" all syndromes as it is unlikely that anyone runs in > > production with debug prints. > > > > - mlx5_ib_warn(dev, "dump error cqe\n"); > > + mlx5_ib_dbg(dev, "dump error cqe\n"); > > > > Something like this will do the trick without interrupting to the others. > > > > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c > > index 457f57b088c6..966206085eb3 100644 > > --- a/drivers/infiniband/hw/mlx5/cq.c > > +++ b/drivers/infiniband/hw/mlx5/cq.c > > @@ -267,10 +267,29 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe, > > wc->wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE; > > } > > -static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe) > > +static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe, > > + struct ib_wc *wc, int dump) > > { > > - mlx5_ib_warn(dev, "dump error cqe\n"); > > - mlx5_dump_err_cqe(dev->mdev, cqe); > > + const char *level; > > + > > + if (!dump) > > + return; > > + > > + mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status, > > + ib_wc_status_msg(wc->status)); > > + > > + if (dump == 1) { > > + mlx5_ib_warn(dev, "dump error cqe\n"); > > + level = KERN_WARNING; > > + } > > + > > + if (dump == 2) { > > + mlx5_ib_dbg(dev, "dump error cqe\n"); > > + level = KERN_DEBUG; > > + } > > + > > + print_hex_dump(level, "", DUMP_PREFIX_OFFSET, 16, 1, cqe, sizeof(*cqe), > > + false); > > } > Hi Leon, > > Thank you for the reply and your suggested method to handle this debug > logging. > > We set 'dump=2' for the syndromes applicable to our scenario: > MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR, > MLX5_CQE_SYNDROME_REMOTE_OP_ERR and MLX5_CQE_SYNDROME_LOCAL_PROT_ERR. > We verified this code change and by default, the dump_cqe is not printed to > syslog until > the level is changed to KERN_DEBUG level. This works as expected. > > I will send out another email with the patch using your method. > > Is it fine with you If I add your name in the 'suggested-by' field in the > new patch? Whatever works for you. Thanks