On Thursday June 6, akpm@zip.com.au wrote: > Neil, > > I think this is a better fix... Thanks. This does look better in that it is more locallised and only affects the observed problem. Though I really liked the idea of refile_buffer called set_buffer_flushtime. It is the best way to make sure the invariant of "dirty list always sorted" is maintained. This actually begs the question: why doesn't ext3 call mark_buffer_dirty in __journal_unfile_buffer? That would seem to be the "right" thing to do, and would avoid this whole problem. ... but on trying it, it doesn't actually work, at least not completely: Jun 12 09:18:44 elfman kernel: buffer on 0 has age 530 Jun 12 09:18:44 elfman kernel: buffer on 0 has age 546 Jun 12 09:18:44 elfman kernel: buffer on 0 has age 561 buffers are still on the dirty list out of order. This patches only fixes __journal_refile_buffer and doesn't fix any calls to __journal_unfile_buffer which I think is the real culprit. But to continue my story of woes..... I left the kernel without this patch running over my extended weekend with "sync" running every minute. This worked ok until Tuesday afternoon (and I got back on Wednesday...). At different times on Tuesday afternoon, all three of my fileservers locked-up. I don't have many details.. just a "ps axgl" listing. I suspect some sort of deadlock happening between sync and kjournald... I will be running with the first patch and no syncing soon and see how that goes. NeilBrown