More ext3 fileserver woes ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday June 6, akpm@zip.com.au wrote:
> Neil,
> 
> I think this is a better fix...

Thanks.  This does look better in that it is more locallised and only
affects the observed problem.

Though I really liked the idea of refile_buffer called
set_buffer_flushtime.  It is the best way to make sure the invariant
of "dirty list always sorted" is maintained.

This actually begs the question:  why doesn't ext3 call
mark_buffer_dirty in __journal_unfile_buffer?  That would seem to be
the "right" thing to do, and would avoid this whole problem.

... but on trying it, it doesn't actually work, at least not
completely:
Jun 12 09:18:44 elfman kernel: buffer on 0 has age 530
Jun 12 09:18:44 elfman kernel: buffer on 0 has age 546
Jun 12 09:18:44 elfman kernel: buffer on 0 has age 561

buffers are still on the dirty list out of order.  This patches only
fixes __journal_refile_buffer and doesn't fix any calls to
__journal_unfile_buffer which I think is the real culprit.

But to continue my story of woes.....

I left the kernel without this patch running over my extended weekend
with "sync" running every minute.
This worked ok until Tuesday afternoon (and I got back on
Wednesday...).
At different times on Tuesday afternoon, all three of my fileservers
locked-up. 
I don't have many details.. just a "ps axgl" listing.  I suspect some
sort of deadlock happening between sync and kjournald...

I will be running with the first patch and no syncing soon and see how
that goes.

NeilBrown





[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux