On Fri, 08 Mar, 2024 14:28:01 -0800 Jacob Keller <jacob.e.keller@xxxxxxxxx> wrote: > On 3/7/2024 9:09 PM, Rahul Rameshbabu wrote: >> >> On Thu, 07 Mar, 2024 19:29:08 -0800 Jacob Keller <jacob.e.keller@xxxxxxxxx> wrote: >>> On 3/7/2024 10:47 AM, Rahul Rameshbabu wrote: >>>> Hi Jacob, >>>> >>>> On Mon, 26 Feb, 2024 11:54:49 -0800 Jacob Keller <jacob.e.keller@xxxxxxxxx> wrote: >>>>> On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote: >>>>>> >>>>>> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <jacob.e.keller@xxxxxxxxx> >>>>>> wrote: >>>>>>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote: >>>>>>>> Do you have any example of a case of skipping timestamp information that >>>>>>>> is not related to lack of delivery over time? I am wondering if this >>>>>>>> case is more like a hardware error or not. Or is it more like something >>>>>>>> along the lines of being busy/would impact line rate of timestamp >>>>>>>> information must be recorded? >>>>>>>> >>>>>>> >>>>>>> The main example for skipped is the event where all our slots are full >>>>>>> at point of timestamp request. >>>>>> >>>>>> This is what I was guessing as the main (if not only reason). For this >>>>>> specific reason, I think a general "busy" stats counter makes sense. >>>>>> mlx5 does not need this counter, but I can see a lot of other hw >>>>>> implementations needing this. (The skipped counter name obviously should >>>>>> be left only in the ice driver. Just felt "busy" was easy to understand >>>>>> for generalized counters.) >>>>> >>>>> Yea, I don't expect this would be required for all hardware but it seems >>>>> like a common approach if you have limited slots for Tx timestamps >>>>> available. >>>>> >>>> Sorry to bump this thread once more, but I had a question regarding the >>>> Intel driver in regards to this. Instead of having a busy case when all >>>> the slots are full, would it make sense to stop the netdev queues in >>>> this case, we actually do this in mlx5 (though keep in mind that we have >>>> a dedicated queue just for port/phy timestamping that we start/stop). >>>> >>>> Maybe in your case, you can have a mix of HW timestamping and non-HW >>>> timestamping in the same queue, which is why you have a busy case? >>>> >>> >>> We don't use a dedicated queue. The issue isn't queue capacity so much >>> as it is the number of slots in the PHY for where it can save the >>> timestamp data. >> >> In mlx5, we use a dedicated queue just for the purpose of HW >> timestamping because we actually do have a similar slot mechanism. We >> call it metadata. We have a limit of 256 entries. We steer PTP traffic >> specifically (though we will be changing this to any HW timestamped >> traffic with the work Kory is doing) to this queue by matching against >> the protocol and port. All other traffic goes to the normal queues that >> cannot consume the timestamping slots. When all the slots are occupied, >> we stop the timestamping queue rather than throwing some busy error. >> >>> >>> In practice the most common application (ptp4l) synchronously waits for >>> timestamps, and only has one outstanding at a time. Likely due to >>> limitations with original hardware that only supported one outstanding >>> Tx timestamp. >>> >>>> Wanted to inquire about this before sending out a RFC v2. >>> >>> That's actually an interesting approach to change to a dedicated queue >>> which we could lock and start/stop it when the indexes are full. How >>> does that interact with the stack UDP and Ethernet stacks? Presumably >>> when you go to transmit, you'd need to pick a queue and if its stopped >>> you'd have to drop or tell the stack? >> >> Let me share a pointer in mlx5 for how we do the queue selection. Like I >> mentioned, we steer ptp traffic specifically, but we can change this to >> just steer any skb that indicates hw timestamping. >> >> * >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n71 >> >> Then, here is how we manage stopping and waking the queue (we tell the >> core stack about this so we do not have to drop traffic due to some kind >> of busy state because our metadata/slots are all consumed). >> > > Makes sense. > >> * >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n775 >> * >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n257 >> * >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n397 >> >>> >>> I think I remember someone experimenting with returning NETDEV_TX_BUSY >>> when the slots were full, but in practice this caused a lot of issues. >>> None of the other devices we have with only a single slot (one set of >>> registers, ixgbe, i40e, igb, e1000) did that either. >> >> So we experimented that even with a single slot (we had reasons for >> testing this), the dedicated queue for timestamping worked out nicely. I >> really would suggest investigating this model since I think it might >> play out nicely for the Intel family. >> >>> >>> If this queue model behaves in a sane way (or if we can communicate >>> something similar by reporting back up the stack without needing a >>> dedicated queue?) that could be better than the current situation. >> >> I personally really like the dedicated queue in the device drivers, but >> if we want to instead model this slot management work in the core netdev >> stack, I do not think that is a bad endeavor either (when slots are >> full, hw timestamping traffic is held back till they become available). >> I do think the netif_tx_wake_queue/netif_tx_stop_queue + dedicated HW >> timestamping queue does work out nicely. > > Ok so if I understand this right, .ndo_select_queue has the stack pick a > queue, and we'd implement this to use the SKB flag. Then whenever the > slots for the queue are full we issue netif_tx_stop_queue, and whenever > the slots are released and we have slots open again we issue > netif_tx_wake_queue.. Your summary here is correct. > > While the queue is stopped, the stack basically just buffers requests > and doesn't try to call the ndo_do_xmit routine for that queue until the > queue is ready again? Yes, that's right. Because of this, you can avoid needing to report a busy state and just let the stack buffer till the device backpressure for timestamping resources is released (timestamp information is retrieved from the device, making the slots available). > > Using a dedicated queue has some other advantages in that it could be > programmed with different priority both from the hardware side (prefer > packets waiting in the timestamp queue) and from the software side > (prioritize CPUs running the threads for processing it). That could be > useful in some applications too... > >> >> Let me know your thoughts on this. If you think it's an interesting idea >> to explore, lets not add the busy counter now in this series. I already >> dropped the late counter. We can add the busy counter later on if you >> feel this model I have shared is not viable for Intel. I wanted to avoid >> introducing too many counters pre-emptively that might not actually be >> consumed widely. I had a thought that what you presented with slots is >> very similar to what we have with metadata in mlx5, so I thought that >> maybe handling the management of these slots in a different way with >> something like a dedicated queue for HW timestamping could make the >> design cleaner. >> > > I think I agree with the queue model, though I'm not sure when I could > get to working on implementing this. I'm fine with dropping the busy > counter from this series. No worries. If you are interested in further discussion or any kind of RFC review, I am open to this. If you feel this model is not satisfactory in the future, we can discuss this and then look at adding the busy counter. -- Thanks, Rahul Rameshbabu