Re: Segfault in libvirtd when run as a service

Matthias Bolte <matthias.bolte@xxxxxxxxxxxxxx> · Thu, 10 Jun 2010 21:48:44 +0200

2010/6/10 Emre Erenoglu <erenoglu@xxxxxxxxx>:
> On Thu, Jun 10, 2010 at 10:35 PM, Matthias Bolte
> <matthias.bolte@xxxxxxxxxxxxxx> wrote:
>>
>> 2010/6/10 Emre Erenoglu <erenoglu@xxxxxxxxx>:
>> > On Thu, Jun 10, 2010 at 9:07 PM, Daniel P. Berrange
>> > <berrange@xxxxxxxxxx>
>> > wrote:
>> >>
>> >> On Thu, Jun 10, 2010 at 08:57:15PM +0300, Emre Erenoglu wrote:
>> >> > On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte <
>> >> > matthias.bolte@xxxxxxxxxxxxxx> wrote:
>> >> >
>> >> > > 2010/6/10 Emre Erenoglu <erenoglu@xxxxxxxxx>:
>> >> > > The initscript explicitly starts the one in /usr/sbin. If you just
>> >> > > start libvirtd manually without an absolute path then you'll start
>> >> > > the
>> >> > > one in /usr/local/sbin. This might explain why you cannot reproduce
>> >> > > the segfault manually, but it doesn't explain why the segfault
>> >> > > happens.
>> >> > >
>> >> >
>> >> > There's no other installation of libvirt in the system. I can also
>> >> > reproduce
>> >> > the same thing in all Pardus machines, so I believe it's something in
>> >> > libvirt not doing well with something else in our service init
>> >> > mechanisms.
>> >>
>> >> I guess I'd put money on some environment variable causing trouble.
>> >> It could be a *missing* environment variable that we expect to always
>> >> be set, or something like that
>> >
>> > Hi Daniel, thanks for your message. Yes, I did a small script file as
>> > you
>> > suggested and found out this environment while libvirtd was run:
>> >
>> >
>> > DBUS_STARTER_ADDRESS=unix:path=/var/run/dbus/system_bus_socket,guid=6c515f612162b05d554b59cd4c112d43
>> > KRB5_KTNAME=/etc/libvirt/krb5.tab
>> > PWD=/
>> > DBUS_STARTER_BUS_TYPE=system
>> > SHLVL=1
>> > _=/usr/bin/env
>> >
>> > This looks very weak compared to the standard root environment that I
>> > pasted
>> > in my earlier message.
>>
>> No PATH? I bet there is code in libvirt that assumes getenv("PATH")
>> will be != NULL.
>>
>> Could you try to add PATH to the environment. It can be empty, doesn't
>> matter. Just make sure it's there, so getenv("PATH") returns an empty
>> string instead of NULL.
>
> I just did exactly what you said by the same instinct, ie added the PATH
> environment variable, and, nailed it down! It works! wow!
>

That confirms our assumption.

>> >>
>> >> > > >> Could you provide a GDB backtrace of the segfault? The syslog
>> >> > > >> entry
>> >> > > >> only
>> >> > > >> says that it crashed in libc, that's not enough information to
>> >> > > >> debug the segfault.
>> >> > > >
>> >> > > > Unfortunately, I can't find a related core file in the system. In
>> >> > > > fact,
>> >> > > core
>> >> > > > file is not generated. I'll also try to fix this out and come
>> >> > > > back
>> >> > > > to the
>> >> > > > list.
>> >> > > >
>> >> > >
>> >> > > Getting a backtrace would be simpler if you could reproduce the
>> >> > > problem manually. In that case you could just start libvirtd in
>> >> > > GDB.
>> >> > > But getting a backtrace from a coredump will work too.
>> >> > >
>> >> > I can't reproduce the segfault when I run it manually. It only
>> >> > happens
>> >> > when
>> >> > it's run from this python script. I will try to initialize gdb inside
>> >> > the
>> >> > script and connect remotely to the gdb session, but it's getting a
>> >> > bit
>> >> > over
>> >> > my debugging capabilities :)  For example, I don't know how to assign
>> >> > the
>> >> > symbols and source code etc from the package build directory to gdb.
>> >>
>> >> Try creating a wrapper script, eg
>> >>
>> >>   mv /usr/sbin/libvirtd /usr/sbin/libvirtd.real
>> >>   cat > /usr/sbin/libvirtd <<EOF
>> >>   #!/bin/sh
>> >>   cd /tmp
>> >>   ulimited -c unlimited
>> >>   exec /usr/sbin/libvirtd.real
>> >>   EOF
>> >>   chmod +x /usr/sbin/libvirtd
>> >>
>> >> That will hopefully give you a core dump in /tmp you can get get a
>> >> stack trace from
>> >
>> > Yes, I got the core file with the script. However, when I open the core
>> > file
>> > with gdb, and use bt command to get the backtrace, the only thing it
>> > tells
>> > me is this:
>> >
>> > Core was generated by `/usr/sbin/libvirtd --daemon'.
>> > Program terminated with signal 11, Segmentation fault.
>> > #0  0xb73ed8f3 in ?? ()
>> > (gdb) bt
>> > Cannot access memory at address 0x810b9db
>> >
>> > Maybe I don't know enough of debugging as I know I have to see the code
>> > lines (somehow) at this segfault point. Could you guide me on that?
>> >
>> > Thanks,
>> >
>> > Br,
>> > Emre
>> >
>>
>> Strange backtrace. Maybe there is heap corruption going on so that GDB
>> can't make sense out of it anymore.
>>
>> I'll do some research about the PATH usage in libvirt now.
>
> OK. I guess it's used to find the dhcp daemon, iptables etc.  Other service
> scripts seem to work happily without this PATH, but I'll ask developers to
> add it to the python service environment to make sure it works fine.
>
> Thanks again Matthias, Daniel!  I'm a happy guy now :)
>
> Emre Erenoglu
>

Yes, libvirt tries to discover that binaries via the PATH.

The utility function virFindFileInPath used the result of
getenv("PATH") without checking it for NULL. I'll post a patch for
that in a bit.

Matthias

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list