Re: raid5 async_xor: sleep in atomic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 28 2015, Stanislav Samsonov wrote:

> On 24 December 2015 at 00:46, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>>
>> On Wed, Dec 23, 2015 at 2:39 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>> > On Thu, Dec 24 2015, Dan Williams wrote:
>> >>> Changing the GFP_NOIO to GFP_ATOMIC in all the calls to
>> >>> dmaengine_get_unmap_data() in crypto/async_tx/ would probably fix the
>> >>> issue... or make it crash even worse :-)
>> >>>
>> >>> Dan: do you have any wisdom here?  The xor is using the percpu data in
>> >>> raid5, so it cannot be sleep, but GFP_NOIO allows sleep.
>> >>> Does the code handle failure to get_unmap_data() safely?  It looks like
>> >>> it probably does.
>> >>
>> >> Those GFP_NOIO should move to GFP_NOWAIT.  We don't want GFP_ATOMIC
>> >> allocations to consume emergency reserves for a performance
>> >> optimization.  Longer term async_tx needs to be merged into md
>> >> directly as we can allocate this unmap data statically per-stripe
>> >> rather than per request. This asyntc_tx re-write has been on the todo
>> >> list for years, but never seems to make it to the top.
>> >
>> > So the following maybe?
>> > If I could get an acked-by from you Dan, and a Tested-by: from you
>> > Slava, I'll submit upstream.
>> >
>> > Thanks,
>> > NeilBrown
>> >
>> > From: NeilBrown <neilb@xxxxxxxx>
>> > Date: Thu, 24 Dec 2015 09:35:18 +1100
>> > Subject: [PATCH] async_tx: use GFP_NOWAIT rather than GFP_IO
>> >
>> > These async_XX functions are called from md/raid5 in an atomic
>> > section, between get_cpu() and put_cpu(), so they must not sleep.
>> > So use GFP_NOWAIT rather than GFP_IO.
>> >
>> > Dan Williams writes: Longer term async_tx needs to be merged into md
>> > directly as we can allocate this unmap data statically per-stripe
>> > rather than per request.
>> >
>> > Reported-by: Stanislav Samsonov <slava@xxxxxxxxxxxxxxxxx>
>> > Signed-off-by: NeilBrown <neilb@xxxxxxxx>
>>
>> Acked-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>
> Tested-by: Slava Samsonov <slava@xxxxxxxxxxxxxxxxx>

Thanks.

I guess this was problem was introduced by
Commit: 7476bd79fc01 ("async_pq: convert to dmaengine_unmap_data")
in 3.13.  Do we think it deserves to go to -stable?

(I just realised that this is really Dan's code more than mine,
 so why am I submitting it ??? But we are here now so it may as well go
 in through the md tree.)

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux