On fre, feb 23, 2024 at 10:38, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > On Fri, 23 Feb 2024 12:44:53 +0100 > Tobias Waldekranz <tobias@xxxxxxxxxxxxxx> wrote: > >> Add a basic set of tracepoints: >> >> - switchdev_defer: Fires whenever an operation is enqueued to the >> switchdev workqueue for deferred delivery. >> >> - switchdev_call_{atomic,blocking}: Fires whenever a notification is >> sent to the corresponding switchdev notifier chain. >> >> - switchdev_call_replay: Fires whenever a notification is sent to a >> specific driver's notifier block, in response to a replay request. >> >> Signed-off-by: Tobias Waldekranz <tobias@xxxxxxxxxxxxxx> >> --- >> include/trace/events/switchdev.h | 74 ++++++++++++++++++++++++++++++++ >> net/switchdev/switchdev.c | 71 +++++++++++++++++++++++++----- >> 2 files changed, 135 insertions(+), 10 deletions(-) >> create mode 100644 include/trace/events/switchdev.h >> >> diff --git a/include/trace/events/switchdev.h b/include/trace/events/switchdev.h >> new file mode 100644 >> index 000000000000..dcaf6870d017 >> --- /dev/null >> +++ b/include/trace/events/switchdev.h >> @@ -0,0 +1,74 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +#undef TRACE_SYSTEM >> +#define TRACE_SYSTEM switchdev >> + >> +#if !defined(_TRACE_SWITCHDEV_H) || defined(TRACE_HEADER_MULTI_READ) >> +#define _TRACE_SWITCHDEV_H >> + >> +#include <linux/tracepoint.h> >> +#include <net/switchdev.h> >> + >> +#define SWITCHDEV_TRACE_MSG_MAX 128 > > 128 bytes is awfully big to waste on the ring buffer. What's the average > size of a string? I would say the typical message is around 60-80 bytes. Some common examples: PORT_OBJ_ADD PORT_VLAN(flags 0x0 orig br0) vid 1 flags 0x27 PORT_OBJ_DEL HOST_MDB(flags 0x0 orig br0) vid 100 addr 33:33:ff:ff:00:09 The worst case I can think of currently is 95 characters: VXLAN_FDB_ADD_TO_DEVICE vid 1000 addr de:ad:be:ef:00:01 added_by_user is_local locked offloaded >> + >> +DECLARE_EVENT_CLASS(switchdev_call, >> + TP_PROTO(unsigned long val, >> + const struct switchdev_notifier_info *info, >> + int err), >> + >> + TP_ARGS(val, info, err), >> + >> + TP_STRUCT__entry( >> + __field(unsigned long, val) >> + __string(dev, info->dev ? netdev_name(info->dev) : "(null)") >> + __field(const struct switchdev_notifier_info *, info) >> + __field(int, err) >> + __array(char, msg, SWITCHDEV_TRACE_MSG_MAX) >> + ), >> + >> + TP_fast_assign( >> + __entry->val = val; >> + __assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)"); >> + __entry->info = info; >> + __entry->err = err; >> + switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX); > > Is it possible to just store the information in the trace event and then > call the above function in the read stage? There's helpers to pass strings > around (namely a struct trace_seq *p). I'm a complete novice when it comes to tracepoint internals. Am I right in assuming that TP_printk may execute at a much later time than TP_fast_assign? E.g., the object referenced by __entry->info may very well have been freed by that time? That at least seems to be what my naive refactor to replace __entry->msg with __get_buf() suggests :) If so, the layout of the switchdev_notifier_* structs makes it a bit cumbersome to clone the notification in the assign phase, as the size of a specific notification (e.g., switchdev_notifier_port_obj_info) is not known by the common notification (switchdev_notifier_info). In the case of switchdev objects, the problem repeats a second time. E.g., the size of switchdev_obj_port_vlan is not known by switchdev_obj. > It would require a plugin for libtraceevent if you want to expose it to > user space tools for trace-cmd and perf though. > > Another possibility is if this event will not race with other events on he > same CPU, you could create a per-cpu buffer, write into that, and then use > __string() and __assign_str() to save it, as traces happen with preemption > disabled. But bottom halves are still enabled I suppose? Notifications can be generated both from process context (e.g., users configuring the bridge with iproute2), and from bridge packet processing (e.g., adding new neighbors to the FDB). So I don't think that would work in this case. > -- Steve Thanks for taking the time! >> + ), >> + >> + TP_printk("dev %s %s -> %d", __get_str(dev), __entry->msg, __entry->err) >> +); >> + >> +DEFINE_EVENT(switchdev_call, switchdev_defer, >> + TP_PROTO(unsigned long val, >> + const struct switchdev_notifier_info *info, >> + int err), >> + >> + TP_ARGS(val, info, err) >> +); >> + >> +DEFINE_EVENT(switchdev_call, switchdev_call_atomic, >> + TP_PROTO(unsigned long val, >> + const struct switchdev_notifier_info *info, >> + int err), >> + >> + TP_ARGS(val, info, err) >> +); >> + >> +DEFINE_EVENT(switchdev_call, switchdev_call_blocking, >> + TP_PROTO(unsigned long val, >> + const struct switchdev_notifier_info *info, >> + int err), >> + >> + TP_ARGS(val, info, err) >> +); >> + >> +DEFINE_EVENT(switchdev_call, switchdev_call_replay, >> + TP_PROTO(unsigned long val, >> + const struct switchdev_notifier_info *info, >> + int err), >> + >> + TP_ARGS(val, info, err) >> +); >> + >> +#endif /* _TRACE_SWITCHDEV_H */ >> +