Hi Arnd,
On 11/22/2021 9:13 PM, Sai Prakash Ranjan wrote:
On 11/22/2021 9:05 PM, Arnd Bergmann wrote:
On Mon, Nov 22, 2021 at 3:59 PM Sai Prakash Ranjan
<quic_saipraka@xxxxxxxxxxx> wrote:
And if we do move this instrumentation to asm-generic/io.h, how will
that be executed since
the arch specifc read{b,w,l,q} overrides this generic version?
As I understand it, your version also requires architecture specific
changes, so that would be the same: it only works for architectures
that get the definition of readl()/readl_relaxed()/inl()/... from
include/asm-generic/io.h and only override the __raw version. Arnd
Sorry, I didn't get this part, so I am trying this on ARM64:
arm64/include/asm/io.h has read{b,l,w,q} defined.
include/asm-generic/io.h has below:
#ifndef readl
#define readl readl
static inline u32 readl(const volatile void __iomem *addr)
and we include asm-generic/io.h in arm64/include/asm/io.h at the end
after the definitions for arm64 mmio accesors.
So arch implementation here overrides generic ones as I see it, am I
missing something? I even confirmed this
with some trace_printk to generic and arch specific definitions of
readl
and I see arch specific ones being called.
Ah, you are right that the arm64 version currently has custom
definitions
of the high-level interfaces. These predate the introduction of the
__io_{p,}{b,a}{r,w} macros and are currently only used on risc-v.
I think in this case you should start by changing arm64 to use the
generic readl() etc definitions, by removing the extra definitions and
using
#define __io_ar(v) __iormb(__v)
#define __io_bw() dma_wmb()
Sure, will do that.
I got the callback version implemented as suggested by you to compare
the overall size and performance
with the inline version and apparently the size increased more in case
of callback version when compared to
inline version. As for the performance, I ran some basic dd tests and
sysbench and didn't see any noticeable
difference between the two implementations.
**Inline version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
text data bss dec
hex filename
23884219 14284468 532568 38701255 24e88c7 vmlinux
**Callback version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
text data bss dec
hex filename
24108179 14279596 532568 38920343 251e097 vmlinux
$ ./scripts/bloat-o-meter inline-vmlinux callback-vmlinux
add/remove: 8/3 grow/shrink: 4889/89 up/down: 242244/-11564 (230680)
Total: Before=25812612, After=26043292, chg +0.89%
Note: I had arm64 move to asm-generic high level accessors in both
versions which I plan to post together but
not included in below links,
For your reference, here are the 2 versions of the patches,
Inline version -
https://github.com/saiprakash-ranjan/RTB-Patches/blob/main/0001-asm-generic-io-Add-logging-support-for-MMIO-accessor.patch
Callback version -
https://github.com/saiprakash-ranjan/RTB-Patches/blob/main/0001-asm-generic-io-Add-callback-based-MMIO-logging-suppo.patch
Couple of things noted in callback version which didn't look quite good
was that it needed some way to register the
callbacks and need to use some initcall (core_initcall used) as
implemented in above patch but that would probably
mean we would lose some register logging(if there is some) in between
early_initcall(which is when trace events get
enabled) and this trace_readwrite core_initcall. Another thing I noticed
that since we now move to callback based
implementations, the caller info is somewhat inaccurate when compared to
inline version.
Also regarding your earlier comment on inline versions being possibly
problematic on lower memory systems, enabling
Ftrace itself is making a considerable size difference and in such
systems ftrace wouldn't be enabled which implicitly means
MMIO logging which are based on trace events will be disabled anyways.
Here is the size delta with FTRACE enabled and disabled with arm64
defconfig without MMIO logging support:
$ ./scripts/bloat-o-meter ftrace-disabled-vmlinux ftrace-enabled-vmlinux
add/remove: 8/3 grow/shrink: 4889/89 up/down: 242244/-11564 (230680)
Total: Before=22865837, After=25215198, chg +10.27%
Given all this, I would prefer inline versions in asm-generic high level
accessors, let me know what you think.
Thanks,
Sai