Re: md data-check causes soft lockup

Gabriele Trombetti <gabriele.trombetti@xxxxxxxxxx> · Tue, 22 Sep 2009 21:35:09 +0200

Robin Hill wrote:
On Tue Sep 22, 2009 at 07:59:45AM -0700, Lee Howard wrote:

Majed B. wrote:

I must have missed that part. It may not work for your case, but worth trying.

Perhaps Neil Brown, or someone involved could shed some light on this.

If I remember correctly, those soft lockups were harmless anyway.

Not harmless for production use.  Yes, data is not harmed, and yes, the 
problem state does recover when the data-check finishes, but during the 
data-check the system is virtually unresponsive and all other use of the 
system is stalled.

Are you sure this is caused by these soft lockups, and that you're not
just running with too high a /sys/block/mdX/md/sync_speed_max setting?
I've had issues with this on some servers, where the I/O demand for the
sync/check is causing the system to become totally unresponsive.

That's correct for me in the sense that lowering sync_speed_max solves
the problem, see my post, however I'd call it a bug if a value of
sync_speed_max too high starves the system forever. The resync is
supposed to be less prioritarian than normal I/O disk operations, but it
doesn't happen this way. Also note that lowering the value of
stripe_cache_size also solves the problem: how would this fit into your
reasoning?

(BTW I have not checked the mentioned patch yet, I'm not sure I can do
that in a short time because our servers are into production now)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html