Re: How to debug occasional hashmap corruption?

John Reiser <jreiser@xxxxxxxxxxxx> · Tue, 6 Nov 2018 08:42:23 -0800

On 11/6/18 9:57 UTC, juice wrote:

During the past half year I have seen systemd dump core three times due
to what I suspect a hashmap corruption or race.
Each time it looks a bit different and is triggered by different things
but it somehow centers on hashmap operations.

Three intermittent hardware failures in one year on 10,000 boxes is normal.
Keep good records.  If the same box appears twice, then physically destroy it.

Meanwhile, log all events to a circular buffer that just keeps rotating:
date+time (32 bits, 1 microsecond precision), caller (return address),
argument summary (fixed format: string prefixes or hash).  Analyze the dump.

Lock each hashmap operation to insure single-threaded operation,t;
prevent even multiple [supposedly] read-only access.
Lock each signal handler: only one instance of a given signal at a time.

_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel