On a related note, I have always wondered if there was any interest in having something like the /proc/PID/io just for tracking NFS client throughput? The problem is that if you copy a file from NFS to a local filesystem, there is no way to infer whether a process did a NFS read/write (or any NFS IO at all). It is useful to track per PID network IO and things like cgroups (v1) do not provide an easy way to do that. In our case, 99.9% of all network IO a render blade does is NFS client traffic. To your question, I can't say what the BPF equivalent is, but we used systemtap to track per process and per file IO on each render node. However, again we are only interested in IO that results in actual network packets so we needed to account for reads from page cache too. We did it by watching vfs.add_to_page_cache and naively assuming every hit resulted in 4k of network NFS reads. In this way we infer that the read comes over the network as it's not in the page cache yet. The aggregate from all clients matched the network of our NFS servers pretty well so this approach worked for us. We could track all client file IO and correlate it with what the server was doing over the network. The systemtap code was something like the following where files were tracked by nfs.fop.open: probe nfs.fop.open { pid = pid() filename = sprintf("%s", d_path(&\$filp->f_path)) if (filename =~ "/net/.*/data") { files[pid, ino] = filename if ( !([pid, ino] in procinfo)) procinfo[pid, ino] = sprintf("%s", proc()) } } probe vfs.add_to_page_cache { pid = pid() if ([pid, ino] in files ) { readpage[pid, ino] += 4096 files_store[pid, ino] = sprintf("%s", files[pid, ino]) } } But I should say that this no longer works in newer kernels since the addition of folios and I have not figured out a better way to track NFS client reads while excluding the page cache results. For the writes I was just using vfs.write and vfs.writev - I was not too concerned about writeback delays. probe vfs.write { pid = pid() if ([pid, ino] in files) { write[pid, ino] += bytes_to_write files_store[pid, ino] = sprintf("%s", files[pid, ino]) } } I hope that helps. Being from the same industry, we obviously have similar requirements... ;) Daire On Fri, 21 Jul 2023 at 23:46, <lars@xxxxxxxxx> wrote: > > Hello, > > I'm using BPF to do NFS operation accounting for user-space processes. I'd like > to include the number of bytes read and written to each file any processes open > over NFS. > > For write operations, I'm currently using an fexit probe on the > nfs_writeback_done function, and my program appears to be getting the > information I'm hoping for. But I can see that under some circumstances the > actual operations are being done by kworker threads, and so the PID reported by > the BPF program is for that kworker instead of the user-space process that > requested the write. > > Is there a more appropriate function to probe for this information if I only > want it triggered in context of the user-space process that performed the > write? If not, I'm wondering if there's enough information in a probe triggered > in the kworker context to track down the user-space PID that initiated the > writes. > > I didn't find anything related in the kernel's Documentation directory, and I'm > not yet proficient enough with the vfs, nfs, and sunrpc code to find an > appropriate function myself. > > If it matters, our infrastructure is all based on NFSv3. > > Thanks for any leads or documentation pointers! > Lars