On 12/13/2016 06:39 PM, Gabriel Krisman Bertazi wrote:
On 12/06/2016 09:31 AM, Gabriel Krisman Bertazi wrote:
In blk_mq_map_swqueue, there is a memory optimization that frees the
tags of a queue that has gone unmapped. Later, if that hctx is remapped
after another topology change, the tags need to be reallocated.
If this allocation fails, a simple WARN_ON triggers, but the block layer
ends up with an active hctx without any corresponding set of tags.
Then, any income IO to that hctx can trigger an Oops.
I can reproduce it consistently by running IO, flipping CPUs on and off
and eventually injecting a memory allocation failure in that path.
In the fix below, if the system experiences a failed allocation of any
hctx's tags, we remap all the ctxs of that queue to the hctx_0, which
should always keep it's tags. There is a minor performance hit, since
our mapping just got worse after the error path, but this is
the simplest solution to handle this error path. The performance hit
will disappear after another successful remap.
I considered dropping the memory optimization all together, but it
seemed a bad trade-off to handle this very specific error case.
This should apply cleanly on top of Jen's for-next branch.
Hi,
I saw this patchset missed the first PR for 4.10, though I'd really like
to see it merged in this cycle, if possible. If you see anything
particularly concerning about this that I missed, please let me know,
but I fear this has been around for a while without much feedback.
Thanks,
I want to repeat what Gabriel says. We are now seeing evidence of the
condition often. This really needs to get fix soon.
Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html