Re: ceph containers for a faster dev + test cycle

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 19 Jan 2022 13:27:26 -0600

On 1/19/22 10:59 AM, Ken Dreyer wrote:
On Wed, Jan 19, 2022 at 10:27 AM Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Probably the easiest first pass approach would simply be to compile with
symbols and then create a debug container and a non-debug container with
the symbols stripped.
I am not a Ceph debugging expert, and I think that's why this issue is
cloudy for me. If we did that, would it be simple for Ceph users to
utilize? And Teuthology? For example, would we lose some debugging
data if users have to swap container environments like that?

So let's say we have a core file that was generated by a ceph executable 
with the symbols stripped in some kind of minimalist container.  If we 
have another container that provides the symbols in some way (say via 
pre-stripped executables or via installed debuginfo rpms or whatever) we 
should still be able to read the core file assuming everything else is 
the same.  In this case you wouldn't necessarily start up ceph in the 
container with the debuginfo, you'd just load up the container with the 
core file so you could use gdb or whatever to analyze it.  In other 
situations you'd maybe want to run one of the processes with the symbols 
available for live-debugging, but most users wouldn't do that.

Ultimately I think the debuginfod thing is basically an extension of 
this whole concept.  The first thought was "why make users install all 
of these huge debug symbols if they never use them?". So the RHEL guys 
started separating the symbols out into separate debuginfo packages.  
That helps, but once you install them they'll sit there taking up space 
and re-download every time packages are updated.  The idea behind 
debuginfod is that you just ditch all of that and download the symbols 
on demand when you need them.  It's a neat idea, but I'm not sure how I 
feel about the complexity around it which is why I've mostly avoided it 
so far.

Longer term, I *think* this is the usecase for elfutil's debuginfod
server tools:

https://developers.redhat.com/blog/2019/10/14/introducing-debuginfod-the-elfutils-debuginfo-server
I've never really looked at it seriously, but my understanding is that
you basically create a debuginfod server for your executables and then
they can be automatically fetched when you need the symbol information.
The specifically mention debugging in containers that weren't built with
symbols.
Thanks for this link. It looks interesting. I'm going to find the
teams who are working on debuginfod to understand if that will work
with our use-case. I'm also curious about the disconnected use-cases -
what would users do in disconnected environments? Or environments
where they're building their own forks outside of ceph.com? (a likely
use-case where debuginfo is especially useful :)

Yeah, they'd probably need to install their own debuginfod server with 
thier own symbols.  That might be an argument for just building two 
versions of the same container (one that has symbols and one with the 
stripped), but that presumably takes up more disks space if we are 
archiving containers with a bunch of redundant stuff other than the 
symbols.  Separately you could probably also load symbols via other 
mechanisms into the container with the stripped executables, but that 
also might be complex in various ways (up to and including a more 
complex build process involving debuginfo rpms).

Mark

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx