On Wed, Sep 27, 2023 at 12:53 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > This "ceph" tool requires installing 90 MB of additional Debian > > packages, which I just tried on a test cluster, and "ceph fs top" > > fails with "Error initializing cluster client: ObjectNotFound('RADOS > > object not found (error calling conf_read_file)')". Okay, so I have to > > configure something.... but .... I don't get why I would want to do > > that, when I can get the same information from the kernel without > > installing or configuring anything. This sounds like overcomplexifying > > the thing for no reason. > > I have relayed my understanding of this feature (or rather how it was > presented to me). I see where you are coming from, so adding more > CephFS folks to chime in. Let me show these folks how badly "ceph fs stats" performs: # time ceph fs perf stats {"version": 2, "global_counters": ["cap_hit", "read_latency", "write_latency"[...] real 0m0.502s user 0m0.393s sys 0m0.053s Now my debugfs-based solution: # time cat /sys/kernel/debug/ceph/*/metrics/latency item total avg_lat(us) min_lat(us) max_lat(us) stdev(us) [...] real 0m0.002s user 0m0.002s sys 0m0.001s debugfs is more than 200 times faster. It is so fast, it can hardly be measured by "time" - and most of these 2ms is the overhead for executing /bin/cat, not for actually reading the debugfs file. Our kernel-exporter is a daemon process, it only needs a single pread() system call in each iteration, it has even less overhead. Integrating the "ceph" tool instead would require forking the process each time, starting a new Python VM, and so on... For obtaining real-time latency statistics, the "ceph" script is the wrong tool for the job. Max