On Mon, 6 Feb 2012 17:47:32 +0100 Jan Kara <jack@xxxxxxx> wrote: > On Mon 06-02-12 21:12:36, Srivatsa S. Bhat wrote: > > On 02/06/2012 07:25 PM, Jan Kara wrote: > > > > > When discovery of lots of disks happen in parallel, we call > > > invalidate_bh_lrus() once for each disk from partitioning code resulting in a > > > storm of IPIs and causing a softlockup detection to fire (it takes several > > > *minutes* for a machine to execute all the invalidate_bh_lrus() calls). Gad. How many disks are we talking about here? > > > Fix the issue by allowing only single invalidation to run using a mutex and let > > > waiters for mutex figure out whether someone invalidated LRUs for them while > > > they were waiting. > > > > > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > > --- > > > fs/buffer.c | 23 ++++++++++++++++++++++- > > > 1 files changed, 22 insertions(+), 1 deletions(-) > > > > > > I feel this is slightly hacky approach but it works. If someone has better > > > idea, please speak up. > > > > > > > > > Something related that you might be interested in: > > https://lkml.org/lkml/2012/2/5/109 > > > > (This is part of Gilad's patchset that tries to reduce cross-CPU IPI > > interference.) > Thanks for the pointer. I didn't know about it. As Hannes wrote, this > need not be enough for our use case as there might indeed be some bhs in > the LRU. But I'd be interested how well the patchset works anyway. Maybe it > would be enough because after all when we invalidate LRUs subsequent > callers will see them empty and not issue IPI? Hannes, can you give a try > to the patches? If that doesn't work then an option to think about is to have a bool to disable the bh LRU code. That would add a test-n-branch to __find_get_block(), which wouldn't kill us. Arrange for the LRU code to be disabled during device probing. Or just leave the LRU disabled until very late in boot, perhaps. Also, I'm wondering why we call invalidate_bh_lrus() at all during partition reading. Presumably it's where we're shooting down the blockdev pagecache (you didn't tell us and I'm too lazy to hunt it down). But do we really need to drop the pagecache at whatever-this-callsite-is? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html