Re: [PATCH 2/2] pidfd: add pidfdfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 21, 2024 at 02:33:24PM +0200, Christian Brauner wrote:
> On Tue, May 21, 2024 at 08:13:08AM +0200, Jiri Slaby wrote:
> > On 21. 05. 24, 8:07, Jiri Slaby wrote:
> > > On 20. 05. 24, 21:15, Linus Torvalds wrote:
> > > > On Mon, 20 May 2024 at 12:01, Linus Torvalds
> > > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > So how about just a patch like this?  It doesn't do anything
> > > > > *internally* to the inodes, but it fixes up what we expose to user
> > > > > level to make it look like lsof expects.
> > > > 
> > > > Note that the historical dname for those pidfs files was
> > > > "anon_inode:[pidfd]", and that patch still kept the inode number in
> > > > there, so now it's "anon_inode:[pidfd-XYZ]", but I think lsof is still
> > > > happy with that.
> > > 
> > > Now the last column of lsof still differs from 6.8:
> > > -[pidfd:1234]
> > > +[pidfd-4321]
> > > 
> > > And lsof tests still fail, as "lsof -F pfn" is checked against:
> > >      if ! fgrep -q "p${pid} f${fd} n[pidfd:$pid]" <<<"$line"; then
> > > 
> > > Where $line is:
> > > p1015 f3 n[pidfd-1315]
> > > 
> > > Wait, even if I change that minus to a colon, the inner pid (1315)
> > > differs from the outer (1015), but it should not (according to the
> > > test).
> > 
> > This fixes the test (meaning literally "it shuts up the test", but I have no
> > idea if it is correct thing to do at all):
> > -       return dynamic_dname(buffer, buflen, "anon_inode:[pidfd-%llu]",
> > pid->ino);
> > +       return dynamic_dname(buffer, buflen, "anon_inode:[pidfd:%d]",
> > pid_nr(pid));
> > 
> > Maybe pid_vnr() would be more appropriate, I have no idea either.
> 
> So as pointed out the legacy format for pidfds is:
> 
> lrwx------ 1 root root  64 21. Mai 14:24 39 -> 'anon_inode:[pidfd]'
> 
> So it's neither showing inode number nor pid.
> 
> The problem with showing the pid unconditionally like that in
> dynamic_dname() is that it's wrong in various circumstances. For
> example, when the pidfd is sent into a pid namespace outside the it's
> pid namespace hierarchy (e.g., into a sibling pid namespace or when the
> task has already been reaped.
> 
> Imho, showing the pid is more work than it's worth especially because we
> expose that info in fdinfo/<nr> anyway. So let's just do the simple thing.

Here's the updated patch appended. Linus, feel free to commit it
directly or if you prefer I can send it to you later this week.

In any case, I really really would like to try to move away from the
current insanity maybe by the end of the year. So I really hope that
lsof changes to the same format that strace already changed to so we can
flip the switch. That should allow us to get rid of both the weird
non-type st_mode issue and the unpleasant name faking. Does that sound
like something we can try?
>From 0e027ea62813cb61d95012757908e92e8cd46894 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 21 May 2024 14:34:43 +0200
Subject: [PATCH] fs/pidfs: make 'lsof' happy with our inode changes

pidfs started using much saner inodes in commit b28ddcc32d8f ("pidfs:
convert to path_from_stashed() helper"), but that exposed the fact that
lsof had some knowledge of just how odd our old anon_inode usage was.

For example, legacy anon_inodes hadn't even initialized the inode type
in the inode mode, so everything had a type of zero.

So sane tools like 'stat' would report these files as "weird file", but
'lsof' instead used that (together with the name of the link in proc) to
notice that it's an anonymous inode, and used it to detect pidfd files.

Let's keep our internal new sane inode model, but mask the file type
bits at 'stat()' time in the getattr() function we already have, and by
making the dentry name match what lsof expects too.

This keeps our internal models sane, but should make user space see the
same old odd behavior.

Reported-by: Jiri Slaby <jirislaby@xxxxxxxxxx>
Link: https://lore.kernel.org/all/a15b1050-4b52-4740-a122-a4d055c17f11@xxxxxxxxxx/
Link: https://github.com/lsof-org/lsof/issues/317
Cc: Christian Brauner <brauner@xxxxxxxxxx>
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Seth Forshee <sforshee@xxxxxxxxxx>
Cc: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
---
 fs/pidfs.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index a63d5d24aa02..dbb9d854d1c5 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -169,6 +169,24 @@ static int pidfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 	return -EOPNOTSUPP;
 }
 
+
+/*
+ * User space expects pidfs inodes to have no file type in st_mode.
+ *
+ * In particular, 'lsof' has this legacy logic:
+ *
+ *	type = s->st_mode & S_IFMT;
+ *	switch (type) {
+ *	  ...
+ *	case 0:
+ *		if (!strcmp(p, "anon_inode"))
+ *			Lf->ntype = Ntype = N_ANON_INODE;
+ *
+ * to detect our old anon_inode logic.
+ *
+ * Rather than mess with our internal sane inode data, just fix it
+ * up here in getattr() by masking off the format bits.
+ */
 static int pidfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 			 struct kstat *stat, u32 request_mask,
 			 unsigned int query_flags)
@@ -176,6 +194,7 @@ static int pidfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 	struct inode *inode = d_inode(path->dentry);
 
 	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
+	stat->mode &= ~S_IFMT;
 	return 0;
 }
 
@@ -199,12 +218,13 @@ static const struct super_operations pidfs_sops = {
 	.statfs		= simple_statfs,
 };
 
+/*
+ * 'lsof' has knowledge of out historical anon_inode use, and expects
+ * the pidfs dentry name to start with 'anon_inode'.
+ */
 static char *pidfs_dname(struct dentry *dentry, char *buffer, int buflen)
 {
-	struct inode *inode = d_inode(dentry);
-	struct pid *pid = inode->i_private;
-
-	return dynamic_dname(buffer, buflen, "pidfd:[%llu]", pid->ino);
+	return dynamic_dname(buffer, buflen, "anon_inode:[pidfd]");
 }
 
 static const struct dentry_operations pidfs_dentry_operations = {
-- 
2.43.0


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux