On Wed, 2008-11-05 at 08:06 -0600, Anthony Liguori wrote: > Rusty Russell wrote: > > On Wednesday 05 November 2008 09:14:20 Hollis Blanchard wrote: > > > >> Hi Rusty, I'm using a patch that changes the Linux base page size to > >> 64K. (This is actually pretty common in ppc64 world, but I happen to be > >> trying it on ppc32.) > >> > >> I'm seeing a problem with virtio. I think at least part of it can be > >> explained by qemu's TARGET_PAGE_BITS==12, and the guest's > >> PAGE_SHIFT==16. The guest allocates the queue, then passes the pfn (pa > >> >> PAGE_SHIFT) to the virtio backend (vp_find_vq()). The backend then > >> calculates the pa as pfn << TARGET_PAGE_BITS. > >> > >> I have to run right now, but quickly changing qemu TARGET_PAGE_BITS to > >> 16 got me a little further but still didn't work. Any thoughts? > >> > > > > I see Anthony hardwired page size into the queue activation ABI for > > virtio_pci. > > So did you FWIW, virtio-balloon passes PFNs which are computed based on > PAGE_SHIFT. > > > I think that this should be an actual 4096 (or 12) rather than > > depending on guest page size: I agree: it's simply a question of both sides of the interface agreeing on the units used in the interface. As the interface is defined today, both qemu and the guest virtio should use a fixed constant which is neither (Linux) PAGE_SHIFT nor (qemu) TARGET_PAGE_BITS. > So is the issue that PPC can support 4k or 16k pages, and the guest > happens to always use 16k pages? Does the guest set any global flag > indicating it is using 16k pages? Is this anyway we could detect this > in QEMU? To elaborate a little, I'm using a patch to PowerPC 440 Linux that allows you to configure the base page size at build time; choices are 4K, 16K, and 64K. (The hardware supports more sizes, and with other patches or other operating systems qemu would need to worry about 256K, 1M, 16M, and 256M pages.) The page size is set per MMU mapping, and of course it's ridiculous to walk the TLB to see if all the page sizes are the same to "detect" the condition. In fact, regardless of the base page size, the (Linux) kernel is always mapped with 256M pages, and if you consider hugetlbfs the situation is even more fluid. > I don't much like the idea of globally hard coding it to 4k. I'd rather > make it architecture specific. Making the units architecture-specific doesn't solve the problem at all AFAICS. It doesn't even solve my original problem on PowerPC 440 since the guest page size can vary. AFAIK the only reason to use a PFN in this interface in the first place is to allow for physical addresses >32 bits. A hardcoded shift of 12 gives you 44 bits of physical address space (16 TB). This actually isn't very big today, so using an architecture-specific hardcoded 4K size will become an issue anyways, *even on x86*. Brainstorming backwards-compatible interface expansion possibilities: 1. Rename the current interface to "4K_PFN", and add another, let's say "64K_PFN". Of course, if a guest with smaller pages uses the new interface, it must properly align its queue allocation. 2. Rename the current interface to "4K_PFN". Use 64-bit writes to set VIRTIO_PCI_QUEUE_PFN. 32-bit architectures couldn't use this, which might be OK since practically speaking, I think 32-bit architectures can address at most 36 bits of physical space. I also don't know what the semantics are of 64-bit PCI writes (if it's not allowed on physical hardware) -- it looks like Linux doesn't have an iowrite64, for example. 3. Rename the current interface to "4K_PFN". Use multiple writes (high/low) to set VIRTIO_PCI_QUEUE_PFN. Not atomic. To simplify backend implementation, you could require that PFN_HIGH writes come before PFN_LOW. 4. Use multiple writes (set page size, set PFN). SET_PAGE_SIZE must precede SET_PFN. Not atomic. 5. Create a variable-sized interface (still 32-bit write), where the shift value is encoded in the value itself (I guess this is the FP mantissa+exponent approach). For example, the low 8 bits are the shift beyond 12, so a write of 0x10000004 would mean physical address 1<<(12+4). These solutions would solve both problems: a) making "guest page size" explicit, and b) addressing more than 16TB of physical memory in the future. I think I like #3 or #4 the best. Hardcoding the current interface to mean "4K pages" (and updating qemu and Linux to match) would solve my problem, and the 16TB limit could be addressed in the future as needed. -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html