On Tue, Feb 7, 2012 at 12:25 AM, Jan Kara <jack@xxxxxxx> wrote: > On Mon 06-02-12 13:17:17, Andrew Morton wrote: >> On Mon, 6 Feb 2012 17:47:32 +0100 >> Jan Kara <jack@xxxxxxx> wrote: >> >> > On Mon 06-02-12 21:12:36, Srivatsa S. Bhat wrote: >> > > On 02/06/2012 07:25 PM, Jan Kara wrote: >> > > >> > > > When discovery of lots of disks happen in parallel, we call >> > > > invalidate_bh_lrus() once for each disk from partitioning code resulting in a >> > > > storm of IPIs and causing a softlockup detection to fire (it takes several >> > > > *minutes* for a machine to execute all the invalidate_bh_lrus() calls). >> >> Gad. How many disks are we talking about here? > I think something around hundred scsi disks in this case (number of > physical drives is actually lower but multipathing blows it up). I actually > saw machines with close to thousand scsi disks (yes, they had names like > sdabc ;). LOL. Is that a huge SCSI disk array in your server or your are just happy to see me... ? :-) > ... >> > > >> > > Something related that you might be interested in: >> > > https://lkml.org/lkml/2012/2/5/109 >> > > >> > > (This is part of Gilad's patchset that tries to reduce cross-CPU IPI >> > > interference.) >> > Thanks for the pointer. I didn't know about it. As Hannes wrote, this >> > need not be enough for our use case as there might indeed be some bhs in >> > the LRU. But I'd be interested how well the patchset works anyway. Maybe it >> > would be enough because after all when we invalidate LRUs subsequent >> > callers will see them empty and not issue IPI? Hannes, can you give a try >> > to the patches? I think its worth a shot since the mutex just delays the IPIs instead of canceling them altogether. A somewhat similar issue in the direct reclaim path of the buddy allocator trying to reclaim per cpu pages was causing a massive storm of IPIs during OOM with concurrent work loads and the IPI noise patches mitigate 85% of the IPIs sent just by checking to see if there are any per cpu pages on the CPU you are about to IPI, so maybe the same kind of logic applies here as well. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker gilad@xxxxxxxxxxxxx Israel Cell: +972-52-8260388 US Cell: +1-973-8260388 http://benyossef.com "If you take a class in large-scale robotics, can you end up in a situation where the homework eats your dog?" -- Jean-Baptiste Queru -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html