On 24/04/2020 17:59, Jason Gunthorpe wrote: > On Mon, Apr 20, 2020 at 09:22:12AM +0300, Gal Pressman wrote: >> Add a new stat that counts mmap failures, which might help when >> debugging different issues. >> >> Reviewed-by: Firas JahJah <firasj@xxxxxxxxxx> >> Reviewed-by: Yossi Leybovich <sleybo@xxxxxxxxxx> >> Signed-off-by: Gal Pressman <galpress@xxxxxxxxxx> >> drivers/infiniband/hw/efa/efa.h | 3 ++- >> drivers/infiniband/hw/efa/efa_verbs.c | 9 +++++++-- >> 2 files changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h >> index aa7396a1588a..77c9ff798117 100644 >> +++ b/drivers/infiniband/hw/efa/efa.h >> @@ -1,6 +1,6 @@ >> /* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */ >> /* >> - * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved. >> + * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved. >> */ >> >> #ifndef _EFA_H_ >> @@ -40,6 +40,7 @@ struct efa_sw_stats { >> atomic64_t reg_mr_err; >> atomic64_t alloc_ucontext_err; >> atomic64_t create_ah_err; >> + atomic64_t mmap_err; >> }; >> >> /* Don't use anything other than atomic64 */ >> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c >> index b555845d6c14..75eef1ec2474 100644 >> +++ b/drivers/infiniband/hw/efa/efa_verbs.c >> @@ -44,7 +44,8 @@ struct efa_user_mmap_entry { >> op(EFA_CREATE_CQ_ERR, "create_cq_err") \ >> op(EFA_REG_MR_ERR, "reg_mr_err") \ >> op(EFA_ALLOC_UCONTEXT_ERR, "alloc_ucontext_err") \ >> - op(EFA_CREATE_AH_ERR, "create_ah_err") >> + op(EFA_CREATE_AH_ERR, "create_ah_err") \ >> + op(EFA_MMAP_ERR, "mmap_err") >> >> #define EFA_STATS_ENUM(ename, name) ename, >> #define EFA_STATS_STR(ename, name) [ename] = name, >> @@ -1569,6 +1570,7 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext, >> ibdev_dbg(&dev->ibdev, >> "pgoff[%#lx] does not have valid entry\n", >> vma->vm_pgoff); >> + atomic64_inc(&dev->stats.sw_stats.mmap_err); >> return -EINVAL; >> } >> entry = to_emmap(rdma_entry); >> @@ -1604,12 +1606,14 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext, >> err = -EINVAL; >> } >> >> - if (err) >> + if (err) { >> ibdev_dbg( >> &dev->ibdev, >> "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n", >> entry->address, rdma_entry->npages * PAGE_SIZE, >> entry->mmap_flag, err); >> + atomic64_inc(&dev->stats.sw_stats.mmap_err); > > Really? Isn't this something that is only possible with a buggy > rdma-core provider? Why count it? Though unlikely, it could happen, otherwise this error flow wouldn't exist in the first place. If for some reason a customer app steps on a bug we're not aware of, this counter could serve as a red flag.