On Sun, 04 Mar 2012 06:05:46 +0100, David Henningsson <david.henningsson at canonical.com> wrote: > On 03/02/2012 04:45 PM, Tanu Kaskinen wrote: >> On Fri, 2012-03-02 at 13:49 +0100, David Henningsson wrote: >>> Instead of trying to verify the algorithm, I went to Google to look for >>> a reference implementation to compare against, and quickly found [1]. >>> And indeed our flist looks like the one under the section "Naive >>> lock-free stack which suffers from ABA problem." on that page. :-/ >>> What's worse, there does not seem to be an easy fix. > > Have you ever woken up in the middle of the night with an idea of how to > solve the problem almost clear in your mind? The wikipedia page said > "tag bits" was a common workaround, using the low bits of the pointer. I > initially rejected the idea as I didn't think it would be portable > enough, but now I realise that the pointers are just indices to a table. > > So I've written a patch which I post separately, please review. I've > been running my own test case for a while and seems to have succeeded so > far. > > @Eric Casteleijn, as you seem to be the only one that can reproduce this > problem somewhat reliably, would you mind trying my patch for a few days > and see if it resolves the problem? If so, I'll prepare a PPA for you to > use for testing. Hi David, I wrote probably something similar (patch is attached). It would be interesting to compare our patches, but I could not find your patch from the mailing list yet. It is good to keep in mind that at least my version of the tag approach is not guaranteed to avoid the ABA-problem, it only makes it very unlikely. In 64bit system one may argue that it is impossible for one thread to remain asleep while the one element it was trying to manipulate is poped and pushed back to linked list exactly 4294967296 times and then, while the same element is still on top of the list, the thread is scheduled again and it fall for the ABA-problem. But what about 32bit system, can we draw the same conclusion there? I would say the above scenario is still virtually impossible, even if you change the number 4294967296 to 65536. Sorry for the mess I caused. At the time of making the ABA-suffering flist-implementation I was desperately looking for a fix to a notorious Nokia N900 "diesel engine"-bug* and when I found the fix I was just happy that the flist-test passed went on to the next problem. * The sound resembling a diesel engine came when a memblock in a free list was swapped out. While PA was waiting for the memblock to be accessible again the alsa-device kept on playing the same 20ms DMA buffer content over and over again. Best regards, Jyri ps. I have not had time to test the patch on 32bit system yet. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-flist-Avoid-ABA-problem-by-using-transaction-tag-in-.patch Type: text/x-c Size: 7656 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20120304/33eedca5/attachment.bin>