On 3/24/21 8:37 AM, David Gibson wrote:
On Tue, Mar 23, 2021 at 09:47:38AM -0400, Shivaprasad G Bhat wrote:
The patch adds support for the SCM flush hcall for the nvdimm devices.
To be available for exploitation by guest through the next patch.
The hcall expects the semantics such that the flush to return
with H_BUSY when the operation is expected to take longer time along
with a continue_token. The hcall to be called again providing the
continue_token to get the status. So, all fresh requsts are put into
a 'pending' list and flush worker is submitted to the thread pool.
The thread pool completion callbacks move the requests to 'completed'
list, which are cleaned up after reporting to guest in subsequent
hcalls to get the status.
The semantics makes it necessary to preserve the continue_tokens
and their return status even across migrations. So, the pre_save
handler for the device waits for the flush worker to complete and
collects all the hcall states from 'completed' list. The necessary
nvdimm flush specific vmstate structures are added to the spapr
machine vmstate.
Signed-off-by: Shivaprasad G Bhat <sbhat@xxxxxxxxxxxxx>
An overal question: surely the same issue must arise on x86 with
file-backed NVDIMMs. How do they handle this case?
On x86 we have different ways nvdimm can be discovered. ACPI NFIT, e820
map and virtio_pmem. Among these virio_pmem always operated with
synchronous dax disabled and both ACPI and e820 doesn't have the ability
to differentiate support for synchronous dax.
With that I would expect users to use virtio_pmem when using using file
backed NVDIMMS
-aneesh