Hi, I've had my eyes on a couple of bugs filed in Ubuntu, where PulseAudio crashes with SIGABRT somewhere inside pa_memblock_* or pa_memblockq_*, often after several hours of playback. I now got a core dump from the bug reporter (thanks, Eric Casteleijn!) and I started analysing it. It turns out that the same memblockq list_items is used as part of two memblockq chains, in essence for one of the blocks q->next->prev != q. How could this be? Well, they seem to be allocated from a static flist pool. I've never looked at flist before, but it claims to be, "A multiple-reader multipler-write lock-free free list implementation". I had a look at tests/flist-test.c and decided to rewrite the test case to make it a bit tougher and to see if I could verify that it never handed out the same block twice. The test seemed to pass at first, but the ~10th time I ran it, the code hung. All 20 threads accessing the flist seemed to have stalled. I attached to the process with gdb, and found that l->stored, l->empty, and l->stored->next [2] were all equal, so clearly something is wrong. Instead of trying to verify the algorithm, I went to Google to look for a reference implementation to compare against, and quickly found [1]. And indeed our flist looks like the one under the section "Naive lock-free stack which suffers from ABA problem." on that page. :-/ What's worse, there does not seem to be an easy fix. At this point, it seems likely this is the root cause for the problem. But I could use some input on how to proceed. One trivial idea just to add a mutex around the flist as an immediate workaround. It won't make the implementation lock-free, but futexes should be pretty fast, and the lock won't be held for long anyway. So this might be good enough? I'm also noticing that in most cases where we use the flist, is as an optimisation to avoid using malloc, essentially pointer recycling. If this is the case, and the objects are not usually allocated by one thread and freed by another, one could have per-thread flists instead of one flist shared by all threads. Any other ideas? -- David Henningsson, Canonical Ltd. http://launchpad.net/~diwic [1] https://en.wikipedia.org/wiki/ABA_problem [2] and l->empty->next, obviously