On 5/6/13 2:57 PM, Chris Mason wrote: > Quoting Jeff Mahoney (2013-05-06 14:50:33) >> reiserfs has open coded linked lists to handle its cnode infrastructure. >> >> While converting them to list_heads, I found that while the patch looked >> perfectly straightforward based on what the code should have been doing, >> it caused crashes nearly immediately. The issue can be reproduced >> without the conversion by clearing the hprev/hnext pointers in >> remove_journal_hash() when the cnode is removed from the hash list. >> >> It turns out that the code has been working by happy accident for over >> a decade > > Chris from 1999 is really sorry about that one. The new code looks much > less error prone. It turns out my test was buggy itself. I'd set cn->h{prev,next} to NULL too early, truncating the list prematurely. Once I fixed that, I wasn't able to reproduce it anymore. The lists work but the object lifetimes aren't clear and the jl->cn->hash_list traversals still use stale list pointers. We can't hit the can_dirty() BUG() from flush_journal_list because cn->bh has been cleared. Since ->sb, ->blocknr, and ->jlist are also cleared, the list traversals don't match anything anyway. It's unclear what is actually getting traversed (I bet it ultimately can end up traversing the free list), but hits the NULL termination eventually. Things would definitely be quite a bit more wonky if the cnodes weren't allocated up front and stashed back on the free list instead of being freed normally. I have an updated version that saw some review from Jan Kara that I'll post shortly. -Jeff -- Jeff Mahoney SUSE Labs
Attachment:
signature.asc
Description: OpenPGP digital signature