On Thu, 21 Feb 2019 17:17:47 +0100 Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > On Thu, 21 Feb 2019 15:46:08 +0000 > "Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> wrote: > > > * Cornelia Huck (cohuck@xxxxxxxxxx) wrote: > > > On Thu, 21 Feb 2019 15:11:50 +0000 > > > "Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> wrote: > > > > > > > * Cornelia Huck (cohuck@xxxxxxxxxx) wrote: > > > > > On Thu, 21 Feb 2019 13:39:46 +0000 > > > > > "Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> wrote: > > > > > > > > > > > * Cornelia Huck (cohuck@xxxxxxxxxx) wrote: > > > > > > > On Mon, 18 Feb 2019 15:41:07 +0000 > > > > > > > "Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > * Cornelia Huck (cohuck@xxxxxxxxxx) wrote: > > > > > > > > > > > > > > > > > >> We can of course switch the order of mappings > > > > > > > > > > >> > > > > > > > > > > >> [0x000000000000000 ] > > > > > > > > > > >> ... Memory region containing RAM > > > > > > > > > > >> [ram_size ] > > > > > > > > > > >> ... Memory region for memory devices (virtio-pmem, virtio-mem ...) > > > > > > > > > > >> [maxram_size - ram_size ] > > > > > > > > > > >> ... Memory region for e.g. special PCI/CCW devices > > > > > > > > > > >> [ TBD] > > > > > > > > > > >> > > > > > > > > > > >> We can size TBD in a way that we e.g. max out the current page table > > > > > > > > > > >> size before having to switch to more levels. > > > > > > > > > > > > > > > > > > > > > > Yes, that's fine to set some upper limit; you've just got to make sure > > > > > > > > > > > that the hypervisor knows where it can put stuff and if the guest > > > > > > > > > > > does PCI that it knows where it's allowed to put stuff and as long > > > > > > > > > > > as the two don't overlap everyone is happy. > > > > > > > > > > > > > > > > > > Hm... is that an issue for pci? Do we need to care, as s390 uses > > > > > > > > > special instructions anyway? Or do we want to avoid going through them, > > > > > > > > > so that the guest can use normal read/write? > > > > > > > > > > > > > > > > That depends. > > > > > > > > The stuff we use for virtio-fs we need the shared region to be > > > > > > > > accessible by the guest via normal instructions because we're using for > > > > > > > > DAX. For PCI you might be able to avoid it for most other PCI cases. > > > > > > > > > > > > > > So, > > > > > > > - virtio-fs regions need to be accessible like normal memory, so they > > > > > > > need to show up in the region labeled 'TBD' above > > > > > > > > > > > > Yes. > > > > > > > > > > > > > (it would fine to > > > > > > > communicate the 'where' through pci structures) > > > > > > > > > > > > Hmm, mixing PCI structures into something you're not treating as PCI > > > > > > seems weird to me. > > > > > > > > > > I was thinking about the addresses in the TBD area... they need to go > > > > > through _some_ pci structure, I assume? > > > > > > > > Well I think it depends how you make it work with CCW; > > > > > > > > if the addresses being assigned are assigned by the host then I believe you should use > > > > the CCW mechanism you suggested to discover the addresses in the guest. > > > > > > > > If the host is going to allocate a block of PCI space and the guest is > > > > going to allocate the use within that space and access it with normal > > > > instructions then I think it should go via PCI. > > > > > > This is getting a bit confusing... let me try to summarize: > > > > > > - We introduce a special area where shared memory areas are supposed to > > > live. > > > - If a virtio device accessed via ccw defines shared regions, the > > > driver can discover them via a new ccw that indicates an address in > > > that special area. > > > > Right. > > > > > - If a virtio device accessed via pci defines shared regions, the > > > driver will want to discover them via the same mechanism as on other > > > platforms. If I read > > > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00003.html > > > correctly, this will mean an offset into a BAR. This will be a normal > > > pci memory region. > > > > Right; note that like any other BAR it's upto the guest to write the BAR > > to select which area of GPA it wants the BAR to map to. > > > > > Now, this sounds to me that we'll have regions in different memory > > > regions, depending on whether they are accessed via ccw or via pci. Not > > > sure if that's a problem. > > > > As long as they stay out of each others way it shouldn't be. > > > > I'm not sure. What I'm concerted about is: > * if it is a virtio-ccw device it would need to access the shared memory via > normal memory access instructions. Just for reference, my current draft for the ccw spec looks as follows: >From 3461723ce536f12ee73cbf23c1ac26406f21acf2 Mon Sep 17 00:00:00 2001 From: Cornelia Huck <cohuck@xxxxxxxxxx> Date: Mon, 11 Feb 2019 22:52:54 +0100 Subject: [PATCH] ccw: support shared memory regions Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx> --- content.tex | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/content.tex b/content.tex index 836ee5236939..ffe34455e6c3 100644 --- a/content.tex +++ b/content.tex @@ -2078,6 +2078,7 @@ virtio: #define CCW_CMD_READ_VQ_CONF 0x32 #define CCW_CMD_SET_VIRTIO_REV 0x83 #define CCW_CMD_READ_STATUS 0x72 +#define CCW_CMD_GET_REGIONS 0x14 \end{lstlisting} \subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio @@ -2170,7 +2171,9 @@ The following values are supported: \hline 2 & 0 & <empty> & CCW_CMD_READ_STATUS support \\ \hline -3-n & & & reserved for later revisions \\ +3 & 0 & <empty> & CCW_CMD_GET_REGIONS support \\ +\hline +4-n & & & reserved for later revisions \\ \hline \end{tabular} @@ -2449,6 +2452,46 @@ command. Some legacy devices will support two-stage queue indicators, though, and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set them up. +\subsubsection{Handling Shared Memory Regions}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions} + +The CCW_CMD_GET_REGIONS command allows the driver to discover shared memory +regions provided by the device, if any. Shared memory regions are situated +in a device memory region that is distinct from normal system memory. + +The driver provides a pointer to a 4096 byte buffer that is filled out by +the device: + +\begin{lstlisting} + struct shared_regions_info { + be64 num_regions; + struct shared_region_desc regions[]; + }; +\end{lstlisting} + +The buffer contains 0 or more shared region descriptors, as specified +by \field{num_regions}. If the device does not provide shared regions, +\field{num_regions} is 0. Otherwise, the shared region descriptors have +the following format: + +\begin{lstlisting} +struct shared_region_desc { + be64 addr; + be64 len; + u8 id; + u8 pad[3]; +} +\end{lstlisting} + +\field{addr} is the address of the region within the host device memory +region with a length of \field{len}, identified by \field{id}. The contents +of \field{pad} are unpredictable, although it is recommended that the +device fills in zeroes. + +\devicenormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions} + +The device MUST reject the CCW_CMD_GET_REGIONS command if not at least +revision 3 has been negotiated. + \subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation} \subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification} -- 2.17.2 What I'm still unsure about is that "device memory region" I'm talking about in the spec. (That's the 'TBD' region from previous mails.) - Where do we specify what this region is and how it is accessed? - Do we need anything on the s390 architecture side, or can we completely handle this in a hypervisor-specific way? > * if it is a virtio-pci device it would need to access the shared memory as > pci memory, that is via special PCI memory access instructions. My understanding is that we are free to map this to the "device memory region", but I may have misunderstood... > > After a look in arch/s390/include/asm/io.h > > #define pci_iomap pci_iomap > #define pci_iounmap pci_iounmap > #define pci_iomap_wc pci_iomap > #define pci_iomap_wc_range pci_iomap_range > > #define memcpy_fromio(dst, src, count) zpci_memcpy_fromio(dst, src, count) > #define memcpy_toio(dst, src, count) zpci_memcpy_toio(dst, src, count) > #define memset_io(dst, val, count) zpci_memset_io(dst, val, count) > > #define __raw_readb zpci_read_u8 > #define __raw_readw zpci_read_u16 > #define __raw_readl zpci_read_u32 > #define __raw_readq zpci_read_u64 > #define __raw_writeb zpci_write_u8 > #define __raw_writew zpci_write_u16 > #define __raw_writel zpci_write_u32 > #define __raw_writeq zpci_write_u64 > > I tend to believe that this is how the 'access to pci memory OS abstraction' > Connie was talking about maps to s390 quirks. > > Bottom line is we may need to access shared memory differently if the > transport is virtio-ccw than when the transport is virtio-pci. IIUC, we don't want to access the shared regions through pci instructions, only discover them via pci. > > I would really like to hear Sebasitian's opinion on this. Me too :)