Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Tue, 28 Feb 2012 22:41:39 +0100 (CET)

On Tue, 28 Feb 2012, Peter Zijlstra wrote:

> On Mon, 2012-02-27 at 12:38 -0800, Dan Williams wrote:
> > An experimental hack to tease out whether we are continuing to
> > run the softirq handler past the point of needing scheduling.
> > 
> > It allows only one trip through __do_softirq() as long as need_resched()
> > is set which hopefully creates the back pressure needed to get ksoftirqd
> > scheduled.
> > 
> > Targeted to address reports like the following that are produced
> > with i/o tests to a sas domain with a large number of disks (48+), and
> > lots of debugging enabled (slub_deubg, lockdep) that makes the
> > block+scsi softirq path more cpu-expensive than normal.
> > 
> > With this patch applied the softlockup detector seems appeased, but it
> > seems odd to need changes to kernel/softirq.c so maybe I have overlooked
> > something that needs changing at the block/scsi level?
> > 
> > BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78] 
> 
> So you're stuck in softirq for 22s+, max_restart is 10, this gives that
> on average you spend 2.2s+ per softirq invocation, this is completely
> absolutely bonkers. Softirq handlers should never consume significant
> amount of cpu-time.
> 
> Thomas, think its about time we put something like the below in?

Absolutely. Anything which consumes more than a few microseconds in
the softirq handler needs to be sorted out, no matter what.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html