On 2011-04-11 13:55, NeilBrown wrote: > On Mon, 11 Apr 2011 20:59:28 +1000 NeilBrown <neilb@xxxxxxx> wrote: > >> On Mon, 11 Apr 2011 11:19:58 +0200 Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote: >> >>> On 2011-04-11 06:50, NeilBrown wrote: >> >>>> The only explanation I can come up with is that very occasionally schedule on >>>> 2 separate cpus calls blk_flush_plug for the same task. I don't understand >>>> the scheduler nearly well enough to know if or how that can happen. >>>> However with this patch in place I can write to a RAID1 constantly for half >>>> an hour, and without it, the write rarely lasts for 3 minutes. >>> >>> Or perhaps if the request_fn blocks, that would be problematic. So the >>> patch is likely a good idea even for that case. >>> >>> I'll merge it, changing it to list_splice_init() as I think that would >>> be more clear. >> >> OK - though I'm not 100% the patch fixes the problem - just that it hides the >> symptom for me. >> I might try instrumenting the code a bit more and see if I can find exactly >> where it is re-entering flush_plug_list - as that seems to be what is >> happening. > > OK, I found how it re-enters. > > The request_fn doesn't exactly block, but when scsi_request_fn calls > spin_unlock_irq, this calls preempt_enable which can call schedule, which is > a recursive call. > > The patch I provided will stop that from recursing again as the blk_plug.list > will be empty. > > So it is almost what you suggested, however the request_fn doesn't block, it > just enabled preempt. > > > So the comment I would put at the top of that patch would be something like: Ah, so it was pretty close. That does explain it. I've already queued up the patch, I'll ammend the commit message. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html