On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote: > > On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <jacob.e.keller@xxxxxxxxx> wrote: >> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote: >>>> The Intel ice drivers has the following Tx timestamp statistics: >>>> >>>> tx_hwtstamp_skipped - indicates when we get a Tx timestamp request but >>>> are unable to fulfill it. >>>> tx_hwtstamp_timeouts - indicates we had a Tx timestamp skb waiting for a >>>> timestamp from hardware but it didn't get received within some internal >>>> time limit. >>> >>> This is interesting. In mlx5 land, the only case where we are unable to >>> fulfill a hwtstamp is when the timestamp information is lost or late. >>> >> >> For ice, the timestamps are captured in the PHY and stored in a block of >> registers with limited slots. The driver tracks the available slots and >> uses one when a Tx timestamp request comes in. >> >> So we have "skipped" because its possible to request too many timestamps >> at once and fill up all the slots before the first timestamp is reported >> back. >> >>> lost for us means that the timestamp never arrived within some internal >>> time limit that our device will supposedly never be able to deliver >>> timestamp information after that point. >>> >> >> That is more or less the equivalent we have for timeout. >> >>> late for us is that we got hardware timestamp information delivered >>> after that internal time limit. We are able to track this by using >>> identifiers in our completion events and we only release references to >>> these identifiers upon delivery (never delivering leaks the references. >>> Enough build up leads to a recovery flow). The theory for us is that >>> late timestamp information arrival after that period of time should not >>> happen. However the truth is that it does happen and we want our driver >>> implementation to be resilient to this case rather than trusting the >>> time interval. >>> >> >> In our case, because of how the slots work, once we "timeout" a slot, it >> could get re-used. We set the timeout to be pretty ridiculous (1 second) >> to ensure that if we do timeout its almost certainly because hardware >> never timestamped the packet. > > We were thinking about that design as well. We use a 1 second timeout to > be safe. > > #define MLX5E_PTP_TS_CQE_UNDELIVERED_TIMEOUT (1 * NSEC_PER_SEC) > > Our device does not do any bookkeeping internally to prevent a > completion event with timestamp information from arriving after 1 > second. Some internal folks have said it shouldn't be possible, but we > did not want to take any chances and built a model that is resilient to > timestamp deliveries after any period of time even after consuming the > skb without appending timestamp information. If no other vendor finds > this useful, we could roll this up into the error counter and leave the > late counter as vendor specific. I do not want to introduce too many > counters that are hard to understand. We document the device specific > counters on top of introducing them in the code base already. > > https://docs.kernel.org/networking/device_drivers/ethernet/mellanox/mlx5/counters.html > We can't distinguish "late". At best we could notice if we get a timestamp on an index thats not currently active, but we wouldn't know for sure if it was late from a previous request or due to some other programming error. >> >>> Do you have any example of a case of skipping timestamp information that >>> is not related to lack of delivery over time? I am wondering if this >>> case is more like a hardware error or not. Or is it more like something >>> along the lines of being busy/would impact line rate of timestamp >>> information must be recorded? >>> >> >> The main example for skipped is the event where all our slots are full >> at point of timestamp request. > > This is what I was guessing as the main (if not only reason). For this > specific reason, I think a general "busy" stats counter makes sense. > mlx5 does not need this counter, but I can see a lot of other hw > implementations needing this. (The skipped counter name obviously should > be left only in the ice driver. Just felt "busy" was easy to understand > for generalized counters.) Yea, I don't expect this would be required for all hardware but it seems like a common approach if you have limited slots for Tx timestamps available. > > The reason why I prefer busy is that "skip" to me makes me think someone > used SIOCSHWTSTAMP to filter which packets get timestamped which is very > different from something like lack of resource availability. > Busy is fine with me. >>>> The only major addition I think is the skipped stat, which I would >>>> prefer to have. Perhaps that could be tracked in the netdev layer by >>>> checking whether the skb flags to see whether or not the driver actually >>>> set the appropriate flag? >>> >>> I guess the problem is how would the core stack know at what layer this >>> was skipped at (I think Kory's patch series can be used to help with >>> this since it's adding a common interface in ethtool to select the >>> timestamping layer). As of today, mlx5 is the only driver I know of that >>> supports selecting between the DMA and PHY layers for timestamp >>> information. >>> >> >> Well, the way the interface worked in my understanding was that the core >> sets the SKBTX_HW_TSTAMP flag. The driver is supposed to then prepare >> the packet for timestamp and set the SKBTX_IN_PROGRESS flag. I just >> looked though, and it looks like ice doesn't actually set this flag... > > That would be a good fix. We set this in mlx5. > > /* device driver is going to provide hardware time stamp */ > SKBTX_IN_PROGRESS = 1 << 2, > Yea. I kind of wonder how necessary it is since we haven't been setting it and don't seem to have an existing bug report for this. I can dig through the kernel and see what it actually does... >> >> If we fixed this, in theory the stack should be able to check after the >> packet gets sent with SKBTX_HW_TSTAMP, if SKBTX_IN_PROGRESS isn't set >> then it would be a skipped timestamp? > > One question I have about this idea. Wouldn't SKBTX_IN_PROGRESS also not > be set in the case when timestamp information is lost/a timeout occurs? > I feel like the problem is not being able to separate these two cases > from the perspective of the core stack. > > Btw, mlx5 does keep the flag even when we fail to write timestamp > information..... I feel like it might be a good idea to add a warning in > the core stack if both SKBTX_HW_TSTAMP and SKBTX_IN_PROGRESS are set but > the skb is consumed without skb_hwtstamps(skb) being written by the > driver before consuming the skb. > I was thinking the check would happen much earlier, i.e. the moment we exit the driver xmit routines it would check whether SKBTX_IN_PROGRESS is set. This would be well before any actual Tx timestamp was acquired. Its basically a "if we set SKBTX_HW_TSTAMP and you didn't set IN_PROGRESS then we know you didn't even start a timestamp request, so you must have been busy" It might not be workable because I think the IN_PROGRESS flag is used for another purpose. I tried to read the documentation for it in Documentation, but I got confused a bit. I'm going to go through the code and see what places actually check this flag.