Re: How to debug occasional hashmap corruption?

Vito Caputo <vcaputo@xxxxxxxxxxx> · Tue, 6 Nov 2018 10:11:32 -0800

On Tue, Nov 06, 2018 at 02:30:19PM +0200, juice wrote:
> Lennart Poettering kirjoitti 2018-11-06 12:27:
> > On Di, 06.11.18 11:57, juice (juice@xxxxxxxxxxx) wrote:
> > 
> > > 
> > > Hi,
> > > 
> > > During the past half year I have seen systemd dump core three times
> > > due
> > > to what I suspect a hashmap corruption or race.
> > > Each time it looks a bit different and is triggered by different
> > > things
> > > but it somehow centers on hashmap operations.
> > > 
> > > What would be the prefered way to debug this? I cannot add huge
> > > logging
> > > as this is something that happens once in a blue moon and always in
> > > different compute nodes.
> > > Is there some way I could easily test it by increasing the chance of
> > > such
> > > corruption/race happening?
> > 
> > This looks very much like a memory corruption of some sorts and
> > valgrind should be the tool of choice to track that down.
> > 
> > Lennart
> 
> Thanks tor the prompt reply, Lennart.
> 
> I agree; using valgrind indeed was something already considered, however I
> suspect it might add some overhead in systemd operation?
> 
> The question here was more on the lines how to trigger the problem?
> It is quite rare as it seems the occurrance is about once per two months on
> our QL3 test pool which contains hunderds of VM guests...
> It would be impractical to build and deploy a release which contains systemd
> running under valgrind on every node! :)
> 

In such scenarios where valgrind's overhead is impractical, I'd give
address sanitizer a try.

https://clang.llvm.org/docs/AddressSanitizer.html

Regards,
Vito Caputo
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel