On Sun, Jul 21, 2024 at 04:05:57PM -0700, David Rientjes wrote: > Thanks Dan, this is fantastic! I've been playing with it locally. > > This does indeed appear to meet the exact needs of what I was referring to > above, I'm excited that this already exists. > > Few questions for you: Just a brief preface to my answers: Below is maintained by just a couple engineers and our primary focus is internal debugging use-cases. We welcome contributions as expanding Below's user base leads to benefits for our internal use-cases. I'll try and speak to what we would and wouldn't welcome, but before embarking on some more specific work it may be worth circling back with us to avoid misalignment. > > - Do you know of anybody who has deployed this in their guest when > running on a public cloud? I believe so, engineers from Aviatrix have been contributing to Below recently as they have their customers use below to collect data for off-host debugging. I've heard anecdotally that Netflix has been using Below, but not entirely confident in that still being true. > > - Is there a motivation to add this to well known distros so it is "just > there" and can run out of the box? There's some configuration and > setup that it requires https://github.com/facebookincubator/below?tab=readme-ov-file#installing It's packaged for Below, Alpine Linux and Gentoo already. We'd welcome any additional contributions to package Below for other distros so long as the maintenance burden is not too high. > > - How receptive are the maintainers to adding new data points, things > like additional fields from vmstat, adding in /proc/pagetypeinfo, etc? In general, we welcome contributions adding additional data collection, so long as it is sufficiently performant (e.g. collecting data for each thread in the system may require more rigour to ensure it doesn't blow up storage costs or cpu overhead of running Below) or at least made optional. Of course we expect this to be done in a fashion that doesn't overly burden the maintenance of the codebase as well. We're a bit more scrutinizing about adding data to the TUI (more specifically, we scrutinize where the data gets added) just because adding everyone's personal favorite metric in the most prominent spot leads to UI clutter and devalues the tool as a visual guide to debugging. > > - Any plans to support cgroup v1? :) Would that be nacked outright? > Some customers still run this in their guest No plans, but we're not opposed to contributions. I don't think it would be too challenging, just need to make sure there's some (github) testing setup for it since we are not running cgroup v1 in our internal CI. > > - For the "/usr/bin/below record --retain-for-s 604800 --compress" > support, is there an appetite for separating this out into its own > non-systemd managed process? IOW, the ability to tell the customer > "go run 'mini-below' and send over the data" that *just* does the > record operation and doesn't require installing/configuring anything? I think I follow what you're suggesting here - basically something fully self-contained (relies on no external configuration) to run below record followed by below snapshot or some way to record directly to a snapshot so data can be analyzed off-host. That seems perfectly reasonable. I believe Aviatrix would benefit from making this easier for their customers as well.