> -----Original Message----- > From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- > owner@xxxxxxxxxxxxxxx] On Behalf Of Doug Ledford > Sent: Tuesday, May 17, 2016 11:01 AM > To: Christoph Lameter > Cc: linux-rdma@xxxxxxxxxxxxxxx; Mark Bloch; Jason Gunthorpe; Steve Wise; Majd > Dibbiny; alonvi@xxxxxxxxxxxx > Subject: Re: [PATCH 1/3] ib core: Make device counter infrastructure dynamic > > On 05/17/2016 10:19 AM, Christoph Lameter wrote: > > > > On Mon, 16 May 2016, Doug Ledford wrote: > >> > >> Thanks, this looks good now. When the other two patches come through > > > > The patch can stand on its own and there has been the expectation > > expressed by Mellanox that they want to see this merged first. Guess this > > is to reduce the amount of rewrite they would have to do if things change. > > Then also the team from Mellanox can directly merge the driver changes > > without my involvement. > > > > OK. There are comments from Jason outstanding, and I found one thing > that I missed in my earlier reviews. I think we need to refactor how we > pull out the stats, or at least consider doing so. In particular, look > at how many stats the cxgb3 driver fills in: > > + stats->dirname = "iw_stats"; > + stats->name = names; > + > + stats->value[IPINRECEIVES] = ((u64)m.ipInReceive_hi << 32) + > m.ipInReceive_lo; > + stats->value[IPINHDRERRORS] = ((u64)m.ipInHdrErrors_hi << 32) + > m.ipInHdrErrors_lo; > + stats->value[IPINADDRERRORS] = ((u64)m.ipInAddrErrors_hi << 32) + > m.ipInAddrErrors_lo; > + stats->value[IPINUNKNOWNPROTOS] = ((u64)m.ipInUnknownProtos_hi << > 32) > + m.ipInUnknownProtos_lo; > + stats->value[IPINDISCARDS] = ((u64)m.ipInDiscards_hi << 32) + > m.ipInDiscards_lo; > + stats->value[IPINDELIVERS] = ((u64)m.ipInDelivers_hi << 32) + > m.ipInDelivers_lo; > + stats->value[IPOUTREQUESTS] = ((u64)m.ipOutRequests_hi << 32) + > m.ipOutRequests_lo; > + stats->value[IPOUTDISCARDS] = ((u64)m.ipOutDiscards_hi << 32) + > m.ipOutDiscards_lo; > + stats->value[IPOUTNOROUTES] = ((u64)m.ipOutNoRoutes_hi << 32) + > m.ipOutNoRoutes_lo; > + stats->value[IPREASMTIMEOUT] = m.ipReasmTimeout; > + stats->value[IPREASMREQDS] = m.ipReasmReqds; > + stats->value[IPREASMOKS] = m.ipReasmOKs; > + stats->value[IPREASMFAILS] = m.ipReasmFails; > + stats->value[TCPACTIVEOPENS] = m.tcpActiveOpens; > + stats->value[TCPPASSIVEOPENS] = m.tcpPassiveOpens; > + stats->value[TCPATTEMPTFAILS] = m.tcpAttemptFails; > + stats->value[TCPESTABRESETS] = m.tcpEstabResets; > + stats->value[TCPCURRESTAB] = m.tcpOutRsts; > + stats->value[TCPINSEGS] = m.tcpCurrEstab; > + stats->value[TCPOUTSEGS] = ((u64)m.tcpInSegs_hi << 32) + m.tcpInSegs_lo; > + stats->value[TCPRETRANSSEGS] = ((u64)m.tcpOutSegs_hi << 32) + > m.tcpOutSegs_lo; > + stats->value[TCPINERRS] = ((u64)m.tcpRetransSeg_hi << 32) + > m.tcpRetransSeg_lo, > + stats->value[TCPOUTRSTS] = ((u64)m.tcpInErrs_hi << 32) + m.tcpInErrs_lo; > + stats->value[TCPRTOMIN] = m.tcpRtoMin; > + stats->value[TCPRTOMAX] = m.tcpRtoMax; > > That's a lot of copies, and shifts, and everything else. Then look at > what it does to get them: > > ret = dev->rdev.t3cdev_p->ctl(dev->rdev.t3cdev_p, RDMA_GET_MIB, &m); > > I didn't dig too deep, but that looks suspiciously like it might be an > actual mailbox command to the card. That can be rather expensive. > It is not a mailbox command, but indirect register reads (ie a write_reg + read_reg operation). See cxgb_rdma_ctl(RDMA_GIT_MIB)->t3_tp_get_mib_stats()->t3_read_indirect(). > Then look at how we get the stats to print them to user space: > > +static ssize_t show_protocol_stats(struct ib_device *dev, int index, > + u8 port, char *buf) > +{ > + struct rdma_protocol_stats stats = {0}; > + ssize_t ret; > + > + ret = dev->get_protocol_stats(dev, &stats, port); > + if (ret) > + return ret; > + > + return sprintf(buf, "%llu\n", stats.value[index]); > +} > > In a nutshell, we go through the effort of a suspected mailbox command, > then we fill in all of the stats including all of the copies and shifts > and everything else, then we print out precisely one and only one stat > before we throw the rest of them away. If someone goes into the stats > directory for a card and does cat * or for i in *; do echo -ne "$i:\t"; > cat $i; done, then we will issue 25 mailbox commands, and fill out all > 25 stats structs 25 times, just to print out one complete set of stats. > For cxgb4 this isn't so bad, it's only got 4 items. But the longer the > list gets, the worst this is because it makes our efficiency of > operation O(n^2). Since we can't break out mailbox commands to only > provide part of the data, I think we need to consider using a cached > struct for each device. If the cached data is less than a certain age > on subsequent reads, we use the cached data. If it's too old, we > discard it and get new data. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html