> On Jan 17, 2018, at 10:01 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Wed, 17 Jan 2018 16:13:27 -0500 > Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> >> The problem is in this function (kernel/trace/trace_events.c): >> >> 2212 void trace_event_eval_update(struct trace_eval_map **map, int len) >> 2213 { >> 2214 struct trace_event_call *call, *p; >> 2215 const char *last_system = NULL; >> 2216 int last_i; >> 2217 int i; >> 2218 >> 2219 down_write(&trace_event_sem); >> 2220 list_for_each_entry_safe(call, p, &ftrace_events, list) { >> 2221 /* events are usually grouped together with systems */ >> 2222 if (!last_system || call->class->system != last_system) { >> 2223 last_i = 0; >> 2224 last_system = call->class->system; >> 2225 } >> 2226 >> 2227 for (i = last_i; i < len; i++) { >> 2228 if (call->class->system == map[i]->system) { >> 2229 /* Save the first system if need be */ >> 2230 if (!last_i) >> 2231 last_i = i; >> 2232 update_event_printk(call, map[i]); >> 2233 } >> 2234 } >> 2235 } >> 2236 up_write(&trace_event_sem); >> 2237 } >> >> Loading the nfs.ko module adds an eval map consisting of three entries: >> >> entry 0 is NFS_FILE_SYNC with a value of 2 >> entry 1 is NFS_DATA_SYNC with a value of 1 >> entry 2 is NFS_UNSTABLE with a value of 0 >> >> The nfs.ko trace point print_fmt strings are updated by >> trace_event_eval_update. >> >> For the first trace point in nfs.ko, the first iteration of the inner >> loop visits all three of these map entries, but it's print_fmt does >> not contain any of these enum symbols, so nothing changes. >> >> Now thanks to the "last_i" logic above, entry 0 (NFS_FILE_SYNC) is >> skipped over for other trace points in the module. >> >> Thus when the print_fmt strings for "nfs_initiate_write" and >> "nfs_writeback_done" are updated, they are only partially converted, >> hence: >> >>> "fileid=%02x:%02x:%llu fhandle=0x%08x offset=%lld count=%lu stable=%d (%s)", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), (unsigned long long)REC->fileid, REC->fhandle, REC->offset, REC->count, REC->stable, __print_symbolic(REC->stable, { 0, "NFS_UNSTABLE" }, { 1, "NFS_DATA_SYNC" }, { NFS_FILE_SYNC, "NFS_FILE_SYNC" }) >> >> >> The updated string contains integers for the first two items in the >> symbol table, and has an unconverted third item, which cannot be used >> by "trace-cmd report". >> >> Adding another TRACE_DEFINE_ENUM in fs/nfs/nfstrace.h works around >> this issue. I added >> >> TRACE_DEFINE_ENUM(NFS_INVALID_STABLE_HOW); >> >> _after_ the first three, and that became entry 0 in the nfs system >> eval_map. That entry is unusable due to this bug, but I didn't add >> it to any trace points. The updated print_fmt strings are now 100% >> correct. "trace-cmd report" works exactly as intended. >> >> My inclination is to remove the last_i logic from this function, but >> I don't understand what the last_system/last_i logic is attempting >> accomplish. Do you have any suggestions? > > Nice detective work. I see the bug. > > The last_i is an optimization to not have to search the entire map > array for the first matching map[i].system every time. But the bug is, > if the match is on the first hit (i==0) then we set last_i to 0, and on > the next iteration we set it (incorrectly) to 1. That's the bug. > > Can you try this patch? > > I'll have to add comments to that function, as it took me too long to > figure out WTF I was thinking when I wrote it. > > -- Steve > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index ec0f9aa4e151..9b5b8a362690 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -2213,6 +2213,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len) > { > struct trace_event_call *call, *p; > const char *last_system = NULL; > + bool first = false; > int last_i; > int i; > > @@ -2220,6 +2221,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len) > list_for_each_entry_safe(call, p, &ftrace_events, list) { > /* events are usually grouped together with systems */ > if (!last_system || call->class->system != last_system) { > + first = true; > last_i = 0; > last_system = call->class->system; > } > @@ -2227,8 +2229,10 @@ void trace_event_eval_update(struct trace_eval_map **map, int len) > for (i = last_i; i < len; i++) { > if (call->class->system == map[i]->system) { > /* Save the first system if need be */ > - if (!last_i) > + if (first) { > last_i = i; > + first = false; > + } > update_event_printk(call, map[i]); > } > } Thanks for the patch! I removed the workaround: TRACE_DEFINE_ENUM(NFS_INVALID_STABLE_HOW); and applied the above patch. I'd say the fix passes the test: [root@manet ~]# cat /sys/kernel/debug/tracing/events/nfs/nfs_initiate_write/format name: nfs_initiate_write ID: 1819 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:loff_t offset; offset:8; size:8; signed:1; field:unsigned long count; offset:16; size:8; signed:0; field:enum nfs3_stable_how stable; offset:24; size:4; signed:1; field:dev_t dev; offset:28; size:4; signed:0; field:u32 fhandle; offset:32; size:4; signed:0; field:u64 fileid; offset:40; size:8; signed:0; print fmt: "fileid=%02x:%02x:%llu fhandle=0x%08x offset=%lld count=%lu stable=%s", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), (unsigned long long)REC->fileid, REC->fhandle, REC->offset, REC->count, __print_symbolic(REC->stable, { 0, "UNSTABLE" }, { 1, "DATA_SYNC" }, { 2, "FILE_SYNC" }) [root@manet ~]# And "trace-cmd report" output appears as expected. Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx> -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html