Re: dumping queue state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 20, 2017 at 09:53:33AM -0600, Steve Wise wrote:
> Hey,
>
> I have a need to provide tools for customers to gather runtime state for an
> rdma device.   Say, when an application is stuck waiting for some completion
> or other rdma event.   This includes hw/fw state of course, and equally as
> important, rdma object sw state.  Is debugfs the correct way to export this
> sw state?  The data is quite large potentially; each QP, its structures, the
> dma queue memory, etc.  Ditto for CQs.  Also MR state, etc etc.  It seems
> that would be overloading debugfs to me.  Currently the hw/fw state is being
> gathered via ethtool dump commands (--get-dump, --register-dump,
> --eeprom-dump).  I am considering using the ethtool --get-dump method for
> the low level driver to also include dumping the rdma queue state for the
> device.   Is that a reasonable approach?

In this cycle, I'm going to submit RDMA resource tracking feature, which
will give an infrastructure to do it in RDMA:

The kernel code, it is based on RCU and not final, because my call to
synchronize_rcu is not effective.
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/restrack-rcu

There is supplementary part in RDMAtool, which presents global
information, QP information with options to filter.

It is initial stage.

+ /mnt/iproute2/rdma/rdma res
1: mlx5_0: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
2: mlx5_1: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
3: mlx5_2: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
4: mlx5_3: curr/max: pd 2/16777216 cq 3/16777216 qp 2/262144
5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
+ /mnt/iproute2/rdma/rdma res show mlx5_4
5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4
DEV/PORT  LQPN       TYPE  STATE  PID        COMM
mlx5_4/-  8          UD    RESET  0          [ipoib-verbs]
mlx5_4/1  7          UD    RTS    0          [mlx5-gsi]
mlx5_4/1  1          GSI   RTS    0          [rdma-mad]
mlx5_4/1  0          SMI   RTS    0          [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/
DEV/PORT  LQPN       TYPE  STATE  PID        COMM
mlx5_4/-  8          UD    RESET  0          [ipoib-verbs]
mlx5_4/1  7          UD    RTS    0          [mlx5-gsi]
mlx5_4/1  1          GSI   RTS    0          [rdma-mad]
mlx5_4/1  0          SMI   RTS    0          [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/0
Wrong device name
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1
DEV/PORT  LQPN       TYPE  STATE  PID        COMM
mlx5_4/1  7          UD    RTS    0          [mlx5-gsi]
mlx5_4/1  1          GSI   RTS    0          [rdma-mad]
mlx5_4/1  0          SMI   RTS    0          [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/-
DEV/PORT  LQPN       TYPE  STATE  PID        COMM
mlx5_4/-  8          UD    RESET  0          [ipoib-verbs]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/- -d
DEV/PORT  LQPN       RQPN       TYPE  STATE  PID        COMM            SQ-PSN     RQ-PSN     PATH-MIG
mlx5_4/-  8          ---        UD    RESET  0          [ipoib-verbs]   0          ---        ---
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm
DEV/PORT  LQPN       PID        COMM
mlx5_4/1  7          0          [mlx5-gsi]
mlx5_4/1  1          0          [rdma-mad]
mlx5_4/1  0          0          [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm -d
DEV/PORT  LQPN       PID        COMM
mlx5_4/1  7          0          [mlx5-gsi]
mlx5_4/1  1          0          [rdma-mad]
mlx5_4/1  0          0          [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm pid 0-2000
DEV/PORT  LQPN       PID        COMM
mlx5_4/1  7          0          [mlx5-gsi]
mlx5_4/1  1          0          [rdma-mad]
mlx5_4/1  0          0          [rdma-mad]

>
> Any thoughts/suggestions?

Care to try?

>
> Thanks in advance,
>
> Steve.
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux