Re: [PATCH 1/1] git-grep: improve the --show-function behaviour

René Scharfe <l.s.r@xxxxxx> · Thu, 14 Sep 2023 21:34:48 +0200

Am 13.09.23 um 11:46 schrieb Oleg Nesterov:
> I think I should trim CC to not spam the people who are not
> interested in this discussion...
>
> On 09/12, Junio C Hamano wrote:
>>
>> Documentation may not match the behaviour, but do we know what the
>> behaviour we want is? To me, the last example that shows the same
>> line twice (one as a real hit, the other because of "-p") looks a
>> bit counter-intuitive for the purpose of "help me locate where the
>> grep hits are". I dunno.
>
> I have another opinion. To me the 2nd "=..." marker does help to
> understand the hit location. But this doesn't matter.

You see it as another layer of information, as an annotation, an
additional line containing meta-information. I saw them as context
lines, i.e. lines from the original file shown in the original order
without duplication, like - lines, with the only place for meta-
information being the marker character itself.

> Let me repeat: scripts.
>
> I tried to explain this in 0/1 and in my other replies, but lets
> start from the very beginning once again.
>
> I've never used git-grep with -p/-n and most probably never will.
> But 3 days ago my text editor (vi clone) started to use "grep -pn".
>
> 	$ cat -n TEST.c
>
> 	 1	void func1(struct pid *);
> 	 2
> 	 3	void func2(struct pid *pid)
> 	 4	{
> 	 5		use1(pid);
> 	 6	}
> 	 7
> 	 8	void func3(struct pid *pid)
> 	 9	{
> 	 10		use2(pid);
> 	 11	}
>
>
> when I do
>
> 	:git-grep --untracked -pn pid TEST.c
>
> in my editor it calls the script which parses the output from git-grep
> and puts this
>
> 	<pre>
> 	<a href="TEST.c?1">TEST.c </a> 1 void func1(struct pid *);
> 	<a href="TEST.c?3">TEST.c </a> 3 void func2(struct pid *pid)
> 	<a href="TEST.c?5">TEST.c </a> 5 func2 use1(pid);
> 	<a href="TEST.c?8">TEST.c </a> 8 void func3(struct pid *pid)
> 	<a href="TEST.c?10">TEST.c </a> 10 func3 use2(pid);
> 	</pre>
>
> html to the text buffer which is nicely displayed as
>
> 	TEST.c 1 void func1(struct pid *);
> 	TEST.c 3 void func2(struct pid *pid)
> 	TEST.c 5 func2 use1(pid);
> 	TEST.c 8 void func3(struct pid *pid)
> 	TEST.c 10 func3 use2(pid);
>
> and I can use Ctrl-] to jump to the file/function/location.
>
> And this script is very simple, it parses the output line-by-line. When
> it sees the "=" marker it does some minimal post-processing, records the
> function name to display it in the 3rd column later, and goes to the next
> line.
>
> But without my patch, in this case I get
>
> 	TEST.c 1 void func1(struct pid *);
> 	TEST.c 3 void func2(struct pid *pid)
> 	TEST.c 5 use1(pid);
> 	TEST.c 8 void func3(struct pid *pid)
> 	TEST.c 10 use2(pid);
>
> because the output from git-grep
>
> 	$ git grep --untracked -pn pid TEST.c
> 	TEST.c:1:void func1(struct pid *);
> 	TEST.c:3:void func2(struct pid *pid)
> 	TEST.c:5: use1(pid);
> 	TEST.c:8:void func3(struct pid *pid)
> 	TEST.c:10: use2(pid);
>
> doesn't have the "=..." markers at all.

Sure, that's a problem. You could easily check whether a match is also
a function line according to the default heuristic (does the line start
with a letter, dollar sign or underscore?), e.g. with a glob like
"*:*:[a-zA-Z$_]*". If git grep uses more sophisticated rules then it
becomes impractical -- there are some impressive regexes in userdiff.c
and the script would have to figure out which language the file is
configured to be for Git in the first place.

> But TEST.c is just the trivial/artificial example. From 0/1,
>
> When I do
>
> 	:git-grep -pw pid kernel/sys.c
>
> in my editor without this patch, I get
>
> 	kernel/sys.c 224 sys_setpriority struct pid *pgrp;
> 	kernel/sys.c 294 sys_getpriority struct pid *pgrp;
> 	kernel/sys.c 952 * Note, despite the name, this returns the tgid not the pid. The tgid and
> 	kernel/sys.c 953 * the pid are identical unless CLONE_THREAD was specified on clone() in
> 	kernel/sys.c 963 /* Thread ID - the internal kernel "pid" */
> 	kernel/sys.c 977 sys_getppid int pid;
> 	kernel/sys.c 980 sys_getppid pid = task_tgid_vnr(rcu_dereference(current->real_parent));
> 	kernel/sys.c 983 sys_getppid return pid;
> 	kernel/sys.c 1073 SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
> 	kernel/sys.c 1077 sys_times struct pid *pgrp;
> 	kernel/sys.c 1080 sys_times if (!pid)
> 	kernel/sys.c 1081 sys_times pid = task_pid_vnr(group_leader);
> 	kernel/sys.c 1083 sys_times pgid = pid;
> 	kernel/sys.c 1094 sys_times p = find_task_by_vpid(pid);
> 	kernel/sys.c 1120 sys_times if (pgid != pid) {
> 	kernel/sys.c 1144 static int do_getpgid(pid_t pid)
> 	kernel/sys.c 1147 sys_times struct pid *grp;
> 	kernel/sys.c 1151 sys_times if (!pid)
> 	kernel/sys.c 1155 sys_times p = find_task_by_vpid(pid);
> 	kernel/sys.c 1172 SYSCALL_DEFINE1(getpgid, pid_t, pid)
> 	kernel/sys.c 1174 sys_times return do_getpgid(pid);
> 	kernel/sys.c 1186 SYSCALL_DEFINE1(getsid, pid_t, pid)
> 	kernel/sys.c 1189 sys_getpgrp struct pid *sid;
> 	kernel/sys.c 1193 sys_getpgrp if (!pid)
> 	kernel/sys.c 1197 sys_getpgrp p = find_task_by_vpid(pid);
> 	kernel/sys.c 1214 static void set_special_pids(struct pid *pid)
> 	kernel/sys.c 1218 sys_getpgrp if (task_session(curr) != pid)
> 	kernel/sys.c 1219 sys_getpgrp change_pid(curr, PIDTYPE_SID, pid);
> 	kernel/sys.c 1221 sys_getpgrp if (task_pgrp(curr) != pid)
> 	kernel/sys.c 1222 sys_getpgrp change_pid(curr, PIDTYPE_PGID, pid);
> 	kernel/sys.c 1228 ksys_setsid struct pid *sid = task_pid(group_leader);
> 	kernel/sys.c 1684 SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
> 	kernel/sys.c 1705 check_prlimit_permission tsk = pid ? find_task_by_vpid(pid) : current;
>
> And only the first 5 funcnames are correct.

Well, your script turns "SYSCALL_DEFINE3(setpriority, [...]" into
"sys_setpriority" etc., so it is already knows a lot about function lines.
It could be made to take a second look at match lines, I guess.

But a general solution would require git grep to somehow report both aspects
of matching function lines. That's easy if we allow ourselves to duplicate
lines. This is strange enough to warrant making it a new output format I
think.

Another possibility is to switch the precedence of : and =. With match
coloring it would still be possible to identify most positive matches in
function interactively, but not negative matches (-v) in function lines.
Probably not the best choice, since grep is primarily about finding
matching lines; function line info comes second.

Can we use two markers, i.e. both : and =? No idea what that might break.

There is a Unicode symbol named colon equals, which looks like this: ≔
We added the =, so I guess we could add that thing as well. But is the
world prepared for Unicode output? Not sure. If we need to stay in the
ASCII table the same idea could be implemented with a different character
like # or ;.

> And note that this case is very simple too (I mostly use :git-grep to scan
> the whole linux kernel tree), but even in this simple case I don't think it
> makes sense to use "git-grep -pn" directly, the output is hardly readable
> (at least to me) with or without my patch.

So with the patch below this would look like this:

kernel/sys.c=218=SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
kernel/sys.c:224: struct pid *pgrp;
kernel/sys.c=288=SYSCALL_DEFINE2(getpriority, int, which, int, who)
kernel/sys.c:294: struct pid *pgrp;
kernel/sys.c=943=SYSCALL_DEFINE1(setfsgid, gid_t, gid)
kernel/sys.c:952: * Note, despite the name, this returns the tgid not the pid. The tgid and
kernel/sys.c:953: * the pid are identical unless CLONE_THREAD was specified on clone() in
kernel/sys.c=958=SYSCALL_DEFINE0(getpid)
kernel/sys.c:963:/* Thread ID - the internal kernel "pid" */
kernel/sys.c=975=SYSCALL_DEFINE0(getppid)
kernel/sys.c:977: int pid;
kernel/sys.c:980: pid = task_tgid_vnr(rcu_dereference(current->real_parent));
kernel/sys.c:983: return pid;
kernel/sys.c#1073#SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
kernel/sys.c:1077: struct pid *pgrp;
kernel/sys.c:1080: if (!pid)
kernel/sys.c:1081: pid = task_pid_vnr(group_leader);
kernel/sys.c:1083: pgid = pid;
kernel/sys.c:1094: p = find_task_by_vpid(pid);
kernel/sys.c:1120: if (pgid != pid) {
kernel/sys.c#1144#static int do_getpgid(pid_t pid)
kernel/sys.c:1147: struct pid *grp;
kernel/sys.c:1151: if (!pid)
kernel/sys.c:1155: p = find_task_by_vpid(pid);
kernel/sys.c#1172#SYSCALL_DEFINE1(getpgid, pid_t, pid)
kernel/sys.c:1174: return do_getpgid(pid);
kernel/sys.c#1186#SYSCALL_DEFINE1(getsid, pid_t, pid)
kernel/sys.c:1189: struct pid *sid;
kernel/sys.c:1193: if (!pid)
kernel/sys.c:1197: p = find_task_by_vpid(pid);
kernel/sys.c#1214#static void set_special_pids(struct pid *pid)
kernel/sys.c:1218: if (task_session(curr) != pid)
kernel/sys.c:1219: change_pid(curr, PIDTYPE_SID, pid);
kernel/sys.c:1221: if (task_pgrp(curr) != pid)
kernel/sys.c:1222: change_pid(curr, PIDTYPE_PGID, pid);
kernel/sys.c=1225=int ksys_setsid(void)
kernel/sys.c:1228: struct pid *sid = task_pid(group_leader);
kernel/sys.c#1684#SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
kernel/sys.c:1705: tsk = pid ? find_task_by_vpid(pid) : current;

It uses # for matches that happen to be function lines, and doesn't show
a previous function line for those anymore. Usable?

René

diff --git a/grep.c b/grep.c
index fc2d0c837a..a08da5cdcb 100644
--- a/grep.c
+++ b/grep.c
@@ -1681,6 +1681,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 			goto next_line;
 		}
 		if (hit && (opt->max_count < 0 || count < opt->max_count)) {
+			char sign = ':';
 			count++;
 			if (opt->status_only)
 				return 1;
@@ -1697,12 +1698,14 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 				opt->output(opt, " matches\n", 9);
 				return 1;
 			}
+			if (opt->funcname && match_funcname(opt, gs, bol, eol))
+				sign = '#';
 			/* Hit at this line.  If we haven't shown the
 			 * pre-context lines, we would need to show them.
 			 */
 			if (opt->pre_context || opt->funcbody)
 				show_pre_context(opt, gs, bol, eol, lno);
-			else if (opt->funcname)
+			else if (opt->funcname && sign == ':')
 				show_funcname_line(opt, gs, bol, lno);
 			cno = opt->invert ? icol : col;
 			if (cno < 0) {
@@ -1715,7 +1718,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 				 */
 				cno = 0;
 			}
-			show_line(opt, bol, eol, gs->name, lno, cno + 1, ':');
+			show_line(opt, bol, eol, gs->name, lno, cno + 1, sign);
 			last_hit = lno;
 			if (opt->funcbody)
 				show_function = 1;