On 2021/2/2 16:44, Arnd Bergmann wrote: > On Tue, Feb 2, 2021 at 8:16 AM Zhen Lei <thunder.leizhen@xxxxxxxxxx> wrote: >> + >> +/* >> + * All read and write operations on L3 cache registers are protected by the >> + * spinlock, except for l3cache_init(). Each time the L3 cache operation is >> + * performed, all related information is filled into its registers. Therefore, >> + * there is no memory order problem when only _relaxed() functions are used. > > Thank you for including the text. > > I don't think the explanation with the spin_lock() explains why this > can be considered safe though, as spin_lock() only contains serialization > against other CPUs (smp_mb()) rather than the stronger DMA barriers > implied by readl and writel. As Russell previously explained, these > barriers are the L1 cache operations (e.g. v7_dma_inv_range) do > include stronger barriers, so it would be better to come up with a > justification based on those. Okay, I'll correct the description. > >> + * This can help us achieve some performance improvement: >> + * 1) The readl_relaxed() is about 20ns faster than readl(). >> + * 2) The writel_relaxed() is about 123ns faster than writel(). > > These are not really the performance numbers I asked for, as a > low-level benchmark comparing the instructions is rather meaningless. > The time spent waiting for the barrier depends on what else is going > on around the barrier. Also, most of the time would likely be > spent spinning in the loop around readl() while the cache operations > are in progress, so the latency of a single readl() is not necessarily > significant. > > To have a more useful performance number, try mentioning the > most performance sensitive non-coherent DMA master on one > of the chips that has this cache controller, and a high-level > performance number such as "1.2% more network packets per > second" if that is something you can measure easily. It's not easy. My board only have debugging NIC, only the downstream products have high-speed service NIC. Software needs to be packaged layer by layer. > > Of course, if all high-speed DMA masters on this chip are > cache coherent, there is no need for performance numbers, just > mention that we don't care about speed in that case. It's not cache coherent, otherwise, the L3 cache does not need to be operated. > > Arnd > > . >