RE: md data-check causes soft lockup

"Guy Watkins" <linux-raid@xxxxxxxxxxxxxxxx> · Tue, 22 Sep 2009 21:05:56 -0400

But if the applications are locked out, they can't demand anything.  I have
seen the same on my Linux server, but only with the 2.6 kernel.  The same
hardware with a 2.4 kernel was fine.  I have not seen this myself for at
least 1 year, I assumed it was fixed.

When I was locked out my putty session would not respond.  I don't think it
timed out, but recovered when the rebuild/resync was done.

} -----Original Message-----
} From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
} owner@xxxxxxxxxxxxxxx] On Behalf Of Majed B.
} Sent: Tuesday, September 22, 2009 8:17 PM
} To: Gabriele Trombetti
} Cc: linux-raid
} Subject: Re: md data-check causes soft lockup
} 
} Why would you lower the max value? You should keep the min value as
} low as possible and md would drop to that automatically if there are
} applications demanding access to the array.
} 
} On Tue, Sep 22, 2009 at 10:35 PM, Gabriele Trombetti
} <gabriele.trombetti@xxxxxxxxxx> wrote:
} > Robin Hill wrote:
} >>
} >> On Tue Sep 22, 2009 at 07:59:45AM -0700, Lee Howard wrote:
} >>
} >>
} >>>
} >>> Majed B. wrote:
} >>>
} >>>>
} >>>> I must have missed that part. It may not work for your case, but
} worth
} >>>> trying.
} >>>>
} >>>> Perhaps Neil Brown, or someone involved could shed some light on
} this.
} >>>>
} >>>> If I remember correctly, those soft lockups were harmless anyway.
} >>>>
} >>>
} >>> Not harmless for production use.  Yes, data is not harmed, and yes,
} the
} >>> problem state does recover when the data-check finishes, but during
} the
} >>> data-check the system is virtually unresponsive and all other use of
} the
} >>> system is stalled.
} >>>
} >>>
} >>
} >> Are you sure this is caused by these soft lockups, and that you're not
} >> just running with too high a /sys/block/mdX/md/sync_speed_max setting?
} >> I've had issues with this on some servers, where the I/O demand for the
} >> sync/check is causing the system to become totally unresponsive.
} >>
} >
} > That's correct for me in the sense that lowering sync_speed_max solves
} > the problem, see my post, however I'd call it a bug if a value of
} > sync_speed_max too high starves the system forever. The resync is
} > supposed to be less prioritarian than normal I/O disk operations, but it
} > doesn't happen this way. Also note that lowering the value of
} > stripe_cache_size also solves the problem: how would this fit into your
} > reasoning?
} >
} > (BTW I have not checked the mentioned patch yet, I'm not sure I can do
} > that in a short time because our servers are into production now)
} >
} > --
} > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
} > the body of a message to majordomo@xxxxxxxxxxxxxxx
} > More majordomo info at  http://vger.kernel.org/majordomo-info.html
} >
} 
} 
} 
} --
}        Majed B.
} --
} To unsubscribe from this list: send the line "unsubscribe linux-raid" in
} the body of a message to majordomo@xxxxxxxxxxxxxxx
} More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html