On Wed, Dec 20, 2017 at 09:53:33AM -0600, Steve Wise wrote: > Hey, > > I have a need to provide tools for customers to gather runtime state for an > rdma device. Say, when an application is stuck waiting for some completion > or other rdma event. This includes hw/fw state of course, and equally as > important, rdma object sw state. Is debugfs the correct way to export this > sw state? The data is quite large potentially; each QP, its structures, the > dma queue memory, etc. Ditto for CQs. Also MR state, etc etc. It seems > that would be overloading debugfs to me. Currently the hw/fw state is being > gathered via ethtool dump commands (--get-dump, --register-dump, > --eeprom-dump). I am considering using the ethtool --get-dump method for > the low level driver to also include dumping the rdma queue state for the > device. Is that a reasonable approach? In this cycle, I'm going to submit RDMA resource tracking feature, which will give an infrastructure to do it in RDMA: The kernel code, it is based on RCU and not final, because my call to synchronize_rcu is not effective. https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/restrack-rcu There is supplementary part in RDMAtool, which presents global information, QP information with options to filter. It is initial stage. + /mnt/iproute2/rdma/rdma res 1: mlx5_0: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144 2: mlx5_1: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144 3: mlx5_2: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144 4: mlx5_3: curr/max: pd 2/16777216 cq 3/16777216 qp 2/262144 5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144 + /mnt/iproute2/rdma/rdma res show mlx5_4 5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144 + /mnt/iproute2/rdma/rdma res show qp link mlx5_4 DEV/PORT LQPN TYPE STATE PID COMM mlx5_4/- 8 UD RESET 0 [ipoib-verbs] mlx5_4/1 7 UD RTS 0 [mlx5-gsi] mlx5_4/1 1 GSI RTS 0 [rdma-mad] mlx5_4/1 0 SMI RTS 0 [rdma-mad] + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/ DEV/PORT LQPN TYPE STATE PID COMM mlx5_4/- 8 UD RESET 0 [ipoib-verbs] mlx5_4/1 7 UD RTS 0 [mlx5-gsi] mlx5_4/1 1 GSI RTS 0 [rdma-mad] mlx5_4/1 0 SMI RTS 0 [rdma-mad] + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/0 Wrong device name + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 DEV/PORT LQPN TYPE STATE PID COMM mlx5_4/1 7 UD RTS 0 [mlx5-gsi] mlx5_4/1 1 GSI RTS 0 [rdma-mad] mlx5_4/1 0 SMI RTS 0 [rdma-mad] + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/- DEV/PORT LQPN TYPE STATE PID COMM mlx5_4/- 8 UD RESET 0 [ipoib-verbs] + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/- -d DEV/PORT LQPN RQPN TYPE STATE PID COMM SQ-PSN RQ-PSN PATH-MIG mlx5_4/- 8 --- UD RESET 0 [ipoib-verbs] 0 --- --- + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm DEV/PORT LQPN PID COMM mlx5_4/1 7 0 [mlx5-gsi] mlx5_4/1 1 0 [rdma-mad] mlx5_4/1 0 0 [rdma-mad] + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm -d DEV/PORT LQPN PID COMM mlx5_4/1 7 0 [mlx5-gsi] mlx5_4/1 1 0 [rdma-mad] mlx5_4/1 0 0 [rdma-mad] + /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm pid 0-2000 DEV/PORT LQPN PID COMM mlx5_4/1 7 0 [mlx5-gsi] mlx5_4/1 1 0 [rdma-mad] mlx5_4/1 0 0 [rdma-mad] > > Any thoughts/suggestions? Care to try? > > Thanks in advance, > > Steve. > > > --- > This email has been checked for viruses by AVG. > http://www.avg.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature