Re: odd TRACE_DEFINE_ENUM behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jan 17, 2018, at 10:01 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> 
> On Wed, 17 Jan 2018 16:13:27 -0500
> Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> 
>> 
>> The problem is in this function (kernel/trace/trace_events.c):
>> 
>> 2212 void trace_event_eval_update(struct trace_eval_map **map, int len)
>> 2213 {
>> 2214         struct trace_event_call *call, *p;
>> 2215         const char *last_system = NULL;
>> 2216         int last_i;
>> 2217         int i;
>> 2218 
>> 2219         down_write(&trace_event_sem);
>> 2220         list_for_each_entry_safe(call, p, &ftrace_events, list) {
>> 2221                 /* events are usually grouped together with systems */
>> 2222                 if (!last_system || call->class->system != last_system) {
>> 2223                         last_i = 0;
>> 2224                         last_system = call->class->system;
>> 2225                 }
>> 2226 
>> 2227                 for (i = last_i; i < len; i++) {
>> 2228                         if (call->class->system == map[i]->system) {
>> 2229                                 /* Save the first system if need be */
>> 2230                                 if (!last_i)
>> 2231                                         last_i = i;
>> 2232                                 update_event_printk(call, map[i]);
>> 2233                         }
>> 2234                 }
>> 2235         }
>> 2236         up_write(&trace_event_sem);
>> 2237 }
>> 
>> Loading the nfs.ko module adds an eval map consisting of three entries:
>> 
>> entry 0 is NFS_FILE_SYNC with a value of 2
>> entry 1 is NFS_DATA_SYNC with a value of 1
>> entry 2 is NFS_UNSTABLE with a value of 0
>> 
>> The nfs.ko trace point print_fmt strings are updated by
>> trace_event_eval_update.
>> 
>> For the first trace point in nfs.ko, the first iteration of the inner
>> loop visits all three of these map entries, but it's print_fmt does
>> not contain any of these enum symbols, so nothing changes.
>> 
>> Now thanks to the "last_i" logic above, entry 0 (NFS_FILE_SYNC) is
>> skipped over for other trace points in the module.
>> 
>> Thus when the print_fmt strings for "nfs_initiate_write" and
>> "nfs_writeback_done" are updated, they are only partially converted,
>> hence:
>> 
>>> "fileid=%02x:%02x:%llu fhandle=0x%08x offset=%lld count=%lu stable=%d (%s)", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), (unsigned long long)REC->fileid, REC->fhandle, REC->offset, REC->count, REC->stable, __print_symbolic(REC->stable, { 0, "NFS_UNSTABLE" }, { 1, "NFS_DATA_SYNC" }, { NFS_FILE_SYNC, "NFS_FILE_SYNC" })  
>> 
>> 
>> The updated string contains integers for the first two items in the
>> symbol table, and has an unconverted third item, which cannot be used
>> by "trace-cmd report".
>> 
>> Adding another TRACE_DEFINE_ENUM in fs/nfs/nfstrace.h works around
>> this issue. I added
>> 
>> TRACE_DEFINE_ENUM(NFS_INVALID_STABLE_HOW);
>> 
>> _after_ the first three, and that became entry 0 in the nfs system
>> eval_map. That entry is unusable due to this bug, but I didn't add
>> it to any trace points. The updated print_fmt strings are now 100%
>> correct. "trace-cmd report" works exactly as intended.
>> 
>> My inclination is to remove the last_i logic from this function, but
>> I don't understand what the last_system/last_i logic is attempting
>> accomplish. Do you have any suggestions?
> 
> Nice detective work. I see the bug.
> 
> The last_i is an optimization to not have to search the entire map
> array for the first matching map[i].system every time. But the bug is,
> if the match is on the first hit (i==0) then we set last_i to 0, and on
> the next iteration we set it (incorrectly) to 1. That's the bug.
> 
> Can you try this patch?
> 
> I'll have to add comments to that function, as it took me too long to
> figure out WTF I was thinking when I wrote it.
> 
> -- Steve
> 
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index ec0f9aa4e151..9b5b8a362690 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2213,6 +2213,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
> {
> 	struct trace_event_call *call, *p;
> 	const char *last_system = NULL;
> +	bool first = false;
> 	int last_i;
> 	int i;
> 
> @@ -2220,6 +2221,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
> 	list_for_each_entry_safe(call, p, &ftrace_events, list) {
> 		/* events are usually grouped together with systems */
> 		if (!last_system || call->class->system != last_system) {
> +			first = true;
> 			last_i = 0;
> 			last_system = call->class->system;
> 		}
> @@ -2227,8 +2229,10 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
> 		for (i = last_i; i < len; i++) {
> 			if (call->class->system == map[i]->system) {
> 				/* Save the first system if need be */
> -				if (!last_i)
> +				if (first) {
> 					last_i = i;
> +					first = false;
> +				}
> 				update_event_printk(call, map[i]);
> 			}
> 		}

Thanks for the patch! I removed the workaround:

  TRACE_DEFINE_ENUM(NFS_INVALID_STABLE_HOW);

and applied the above patch. I'd say the fix passes the test:

[root@manet ~]# cat /sys/kernel/debug/tracing/events/nfs/nfs_initiate_write/format
name: nfs_initiate_write
ID: 1819
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:loff_t offset;	offset:8;	size:8;	signed:1;
	field:unsigned long count;	offset:16;	size:8;	signed:0;
	field:enum nfs3_stable_how stable;	offset:24;	size:4;	signed:1;
	field:dev_t dev;	offset:28;	size:4;	signed:0;
	field:u32 fhandle;	offset:32;	size:4;	signed:0;
	field:u64 fileid;	offset:40;	size:8;	signed:0;

print fmt: "fileid=%02x:%02x:%llu fhandle=0x%08x offset=%lld count=%lu stable=%s", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), (unsigned long long)REC->fileid, REC->fhandle, REC->offset, REC->count, __print_symbolic(REC->stable, { 0, "UNSTABLE" }, { 1, "DATA_SYNC" }, { 2, "FILE_SYNC" })
[root@manet ~]# 

And "trace-cmd report" output appears as expected.

Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx>


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux