RE: [PATCH] scsi: storvsc: Fix a panic in the hibernation procedure

Dexuan Cui <decui@xxxxxxxxxxxxx> · Wed, 22 Apr 2020 06:24:43 +0000

> From: Bart Van Assche <bvanassche@xxxxxxx>
> Sent: Tuesday, April 21, 2020 10:02 PM
> On 4/21/20 5:17 PM, Dexuan Cui wrote:
> > During hibernation, the sdevs are suspended automatically in
> > drivers/scsi/scsi_pm.c before storvsc_suspend(), so after
> > storvsc_suspend(), there is no disk I/O from the file systems, but there
> > can still be disk I/O from the kernel space, e.g. disk_check_events() ->
> > sr_block_check_events() -> cdrom_check_events() can still submit I/O
> > to the storvsc driver, which causes a panic of NULL pointer dereference,
> > since storvsc has closed the vmbus channel in storvsc_suspend(): refer
> > to the below links for more info:
> > ...
> > Fix the panic by blocking/unblocking all the I/O queues properly.
> >
> > Note: this patch depends on another patch "scsi: core: Allow the state
> > change from SDEV_QUIESCE to SDEV_BLOCK" (refer to the second link
> above).
> 
> I don't like this patch. It makes the behavior of the storsvc driver
> different from every other SCSI LLD. Other SCSI LLDs don't need this
> change because these don't destroy I/O queues upon suspend.
> 
> Bart.

Upon suspend, I suppose the other LLDs can not accept I/O either, then
what do they do when some kernel thread still tries to submit I/O? Do 
they "block" the I/O until resume, or just return an error immediately?

I had a look at drivers/scsi/xen-scsifront.c. It looks this LLD implements
a mechanism of marking the device busy upon suspend, so any I/O will
immediately get an error SCSI_MLQUEUE_HOST_BUSY. IMO the
disadvantage is: the mechanism of marking the device busy looks
complex, and the hot path .queuecommand() has to take the
spinlock shost->host_lock, which should affect the performance.

It looks drivers/scsi/nsp32.c: nsp32_suspend() and 
drivers/scsi/3w-9xxx.c: twa_suspend() do nothing to handle new I/O 
after suspend. I doubt this is correct.

It looks drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: hisi_sas_v3_suspend()
tries to use scsi_block_requests() as a means of blocking new I/O, but
I doubt it can work: scsi_block_requests() only sets a variable --
how could this prevent another concurrent kernel thread to submit new
I/O without a race condition issue?

So it looks to me there is no simple mechanism to handle the scenario
here, and I guess that's why the scsi_host_block/unblock APIs are
introduced, and actually there is already an user of the APIs:
3d3ca53b1639 ("scsi: aacraid: use scsi_host_(block,unblock) to block I/O").

The aacraid patch says: "This has the advantage that the block layer will 
stop sending I/O to the adapter instead of having the SCSI midlayer
requeueing I/O internally". It looks this may imply that using the new 
APIs is encouraged?

Sorry for my lack of knowledge in Linux SCSI system, but IMHO the new
APIs are simple and effective, and they really look like a good fit for the
hibernation scenario. BTW, storvsc_suspend() is not a hot path here,
so IMO I don't have any performance concern.

PS, here storvsc has to destroy and re-construct the I/O queues: the
I/O queues are shared memory ringbufers between the guest and the
host; in the resume path of the hibernation procedure, the memory 
pages allocated by the 'new' kernel is different from that allocated by
the 'old' kernel, so before jumping to the 'old' kernel, the 'new' kernel
must destroy the mapping of the pages, and later after we jump to 
the 'old' kernel, we'll re-create the mapping using the pages allocated 
by the 'old' kernel. Here "create the mapping" means the guest tells
the host about the physical addresses of the pages.

Thanks,
-- Dexuan