On Fri, Apr 9, 2021 at 1:06 AM Neal Shukla <nshukla@xxxxxxxxxxxxx> wrote: > > Using perf, we've confirmed that the line mentioned has a 25.58% cache miss > rate. Do these hit in the LLC or in DRAM? In any case, your best bet is likely to prefetch this into your L1/L2. In my experience, the best way to do this is not to use an explicit prefetch instruction but to touch/fetch the cache lines you need in the beginning of your computation and let the fetch latency and the usage of the first cache line hide the latencies of fetching the others. In your case, touch both metadata and packet at the same time. Work with the metadata and other things then come back to the packet data and hopefully the relevant part will reside in the cache or registers by now. If that does not work, touch packet number N+1 just before starting with packet N. Very general recommendations but hope it helps anyway. How exactly to do this efficiently is very application dependent. /Magnus > On Thu, Apr 8, 2021 at 2:38 PM Zvi Effron <zeffron@xxxxxxxxxxxxx> wrote: > > > > Apologies for the spam to anyone who received my first response, but > > it was accidentally sent as HTML and rejected by the mailing list. > > > > On Thu, Apr 8, 2021 at 11:20 AM Neal Shukla <nshukla@xxxxxxxxxxxxx> wrote: > > > > > > System Info: > > > CPU: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz > > > Network Adapter/NIC: Intel X710 > > > Driver: i40e > > > Kernel version: 5.8.15 > > > OS: Fedora 33 > > > > > > > Slight correction, we're actually on the 5.10.10 kernel. > > > > --Zvi