Re: [fuse-devel] [fuse] Getting visibility into reads from page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 8, 2020 at 5:29 PM Nikolaus Rath <Nikolaus@xxxxxxxx> wrote:
>
> On Apr 27 2020, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > On Sat, Apr 25, 2020 at 7:07 PM Nikolaus Rath <Nikolaus@xxxxxxxx> wrote:
> >>
> >> Hello,
> >>
> >> For debugging purposes, I would like to get information about read
> >> requests for FUSE filesystems that are answered from the page cache
> >> (i.e., that never make it to the FUSE userspace daemon).
> >>
> >> What would be the easiest way to accomplish that?
> >>
> >> For now I'd be happy with seeing regular reads and knowing when an
> >> application uses mmap (so that I know that I might be missing reads).
> >>
> >>
> >> Not having done any real kernel-level work, I would start by looking
> >> into using some tracing framework to hook into the relevant kernel
> >> function. However, I thought I'd ask here first to make sure that I'm
> >> not heading into the completely wrong direction.
> >
> > Bpftrace is a nice high level tracing tool.
> >
> > E.g.
> >
> >   sudo bpftrace -e 'kretprobe:fuse_file_read_iter { printf ("fuse
> > read: %d\n", retval); }'
>
> Thanks, this looks great! I had to do some reading about bpftrace first,
> but I think this is exacly what I'm looking for. A few more questions:
>
>
> - If I attach a probe to fuse_file_mmap, will this tell me whenever an
>   application attempts to mmap() a FUSE file?

Yes.

> - I believe that (struct kiocb*)arg0)->ki_pos will give me the offset
>   within the file, but where can I see how much data is being read?
>
> Looking at the code in fuse_file_read_iter, it seems the length is in
> ((struct iov_iter*)arg1)->count, but I do not really understand why.

That's correct.

> The definiton of this parameter is:
>
> struct iov_iter {
>         int type;
>         const struct iovec *iov;
>         unsigned long nr_segs;
>         size_t iov_offset;
>         size_t count;
> };
>
> ..so I would think that *count* is the number of `iovec` elements hiding
> behind the `iov` pointer, not some total number of bytes.

That's nr_segs.

> Furthermore, there is a function iov_length() that is documented to
> return the "total number of bytes covered by an iovec" and doesn't look
> at `count` at all.

iov_iter_count() is the accessor function that does this.

> - What is the best way to connect read requests to a specific FUSE
>   filesystems (if more than one is mounted)? I found the superblock in
>   (struct kiocb*)arg0)->ki_filp->f_mapping->host->i_sb->s_fs_info, but I
>   do not see anything in this structure that I could map to a similar
>   value that FUSE userspace has access to...

You can match up ki_filp->f_inode->i_sb->s_dev with st_dev on any
file.  I think the kernel encodes the device value differently, but
the bits should be there.

> - I assume fuse_file_read_iter is called for every read request for FUSE
>   filesystems unless it's an mmap'ed access. Is that right?

Correct.

> - Is there any similar way to catch access to an mmap'ed file? I think
>   there is probably a way to make sure that every memory read triggers a
>   page fault and then hook into the fault handler, but I am not sure how
>   difficult this is to do and how much performance this would cost....

Not sure if that's implementable, but it would surely be grossly
inefficient.  Flushing page tables e.g. every second would probably
work, but then you'd only get the read pattern on a one second
granularity.

> - If my BPF program contains e.g. a printf statement, will execution of
>   the kernel function block until the printf has completed, or is there
>   some queuing mechanism?

AFAIK there's some queuing.

Thanks,
Miklos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux