Re: [PATCH 2/2] pidfd: add pidfdfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the delayed reply. I was attending LSFMM last week. Including
travel that basically amounted to a week of limited email time. Catching
up on things now.

On Fri, May 17, 2024 at 01:07:43PM -0700, Linus Torvalds wrote:
> On Fri, 17 May 2024 at 00:54, Jiri Slaby <jirislaby@xxxxxxxxxx> wrote:
> >
> >          inode->i_private = data;
> >          inode->i_flags |= S_PRIVATE;
> > +       inode->i_mode &= ~S_IFREG;
> 
> That is not a sensible operation. S_IFREG isn't a bit mask.
> 
> But it looks like 'anon_inode' traditionally had *no* type bytes at
> all. That's literally crazy.

Yes, it's pretty wild and I had long discussions about this before when
people told me it's impossible for st_mode to have no type. A lot of
low-level software isn't aware of this quirk and it's not something I
particularly enjoy.

> Doing a 'stat -L' on one in /proc/X/fd/Y will correctly say "weird
> file" about them.

Oh yes, it also broke assumptions in systemd and some other software
which had code related to stat() that assumed it could never see an
empty file type in st_mode.

> What a crock. That's horrible, and we apparently never noticed how
> broken anon_inodes were because nobody really cared. But then lsof
> seems to have done the *opposite* and just said (for unfathomable
> reasons) "this can't be a normal regular file".
> 
> But I can't actually find that code in lsof. I see
> 
>                  if (rest && rest[0] == '[' && rest[1] == 'p')
>                      fdinfo_mask |= FDINFO_PID;
> 
> which only checks that the name starts with '[p'. Hmm.
> 
> [ Time passes, I go looking ]
> 
> Oh Christ. It's process_proc_node:
> 
>         type = s->st_mode & S_IFMT;
>         switch (type) {
>         ...
>         case 0:
>             if (!strcmp(p, "anon_inode"))
>                 Lf->ntype = Ntype = N_ANON_INODE;
>             break;
> 
> so yes, process_proc_node() really seems to have intentionally noticed
> that our anon inodes forgot to put a file type in the st_mode, and
> together with the path from readlink matching 'anon_inode' is how lsof
> determines it's one of the special inodes.

strace does that too but strace updated it's code fairly early on to
accommodate pidfs. I had searched github to check that
anon_inode:[pidfd] wasn't something that userspace relied on and other
than strace it didn't yield any meaningful results and didn't surface
lsof unfortunately.

> 
> So yeah, we made a mistake, and then lsof decided that mistake was a feature.
> 
> But that does mean that we probably just have to live in the bed we made.

Yes, we likely do.

> 
> But that
> 
> > +       inode->i_mode &= ~S_IFREG;
> 
> is still very very wrong. It should use the proper bit mask: S_IFMT.
> 
> And we'd have to add a big comment about our historical stupidity that
> we are perpetuating.
> 
> Oh well.
> 
>                Linus




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux