On 7/1/21 1:59 AM, Ming Lei wrote:
On Wed, Jun 30, 2021 at 09:46:35PM +0200, Hannes Reinecke wrote:
On 6/30/21 8:59 PM, Sagi Grimberg wrote:
Shouldn't we rather modify the tagset to only refer to
the current online
CPUs _only_, thereby never submit a connect request for hctx with only
offline CPUs?
Then you may setup very less io queues, and performance may suffer even
though lots of CPUs become online later.
;
Only if we stay with the reduced number of I/O queues. Which is
not what I'm
proposing; I'd rather prefer to connect and disconnect queues
from the cpu
hotplug handler. For starters we could even trigger a reset once
the first
cpu within a hctx is onlined.
Yeah, that need one big/complicated patchset, but not see any advantages
over this simple approach.
I tend to agree with Ming here.
Actually, Daniel and me came to a slightly different idea: use cpu hotplug
notifier.
Thing is, blk-mq already has cpu hotplug notifier, which should ensure that
no I/O is pending during cpu hotplug.
Why should we ensure that for non-managed irq?
While not strictly necessary, it does align the hctx layout with the
current CPU topology.
As such we won't have any 'stale' CPUs or queues in the hctx layout, and
with that avoid any issues we'll be having due to inactive CPUs in the
cpumask.
If we now add a nvme cpu hotplug notifier which essentially kicks off a
reset once all cpu in a hctx are offline the reset logic will rearrange the
queues to match the current cpu layout.
And when the cpus are getting onlined we'll do another reset.
Daniel is currently preparing a patch; let's see how it goes.
What is the advantage of that big change over this simple way?
With the simple way we might (and, as the results show, do) run the
nvme_ctrl_reset() in parallel to CPU hotplug.
This leads to quite some complexity, and as we've seen is triggering
quite some race conditions.
Hence I do think we need to synchronize nvme_ctrl_reset() with CPU
hotplug, to ensure that the reset handler is completed before the cpu is
completely torn down.
But synchronizing with nvme_ctrl_reset() means to issue a flush_work();
and if we do that we might as well go full circle and call
nvme_ctrl_reset_sync(). And as this will rearrange the hctx to match the
current CPU topology we don't need to fiddle with it in the hotplug
handler, and can leave everything to nvme_ctrl_reset().
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer