On 15.05.20 20:36, Dr. David Alan Gilbert wrote: > * David Hildenbrand (david@xxxxxxxxxx) wrote: >> On 15.05.20 19:51, Dr. David Alan Gilbert wrote: >>> * David Hildenbrand (david@xxxxxxxxxx) wrote: >>>> On 15.05.20 14:45, Dr. David Alan Gilbert wrote: >>>>> * David Hildenbrand (david@xxxxxxxxxx) wrote: >>>>>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want >>>>>> to mark RAM block discards to be broken - however, to keep it simple >>>>>> use ram_block_discard_is_required() instead of inhibiting. >>>>> >>>>> Should this be dependent on whether rdma->pin_all is set? >>>>> Even with !pin_all some will be pinned at any given time >>>>> (when it's registered with the rdma stack). >>>> >>>> Do you know how much memory this is? Is such memory only temporarily pinned? >>> >>> With pin_all not set, only a subset of memory, I think multiple 1MB >>> chunks, are pinned at any one time. >>> >>>> At least with special-cases of vfio, it's acceptable if some memory is >>>> temporarily pinned - we assume it's only the working set of the driver, >>>> which guests will not inflate as long as they don't want to shoot >>>> themselves in the foot. >>>> >>>> This here sounds like the guest does not know the pinned memory is >>>> special, right? >>> >>> Right - for RDMA it's all of memory that's being transferred, and the >>> guest doesn't see when each part is transferred. >> >> >> Okay, so all memory will eventually be pinned, just not at the same >> time, correct? >> >> I think this implies that any memory that was previously discarded will >> be backed my new pages, meaning we will consume more memory than intended. >> >> If so, always disabling discarding of RAM seems to be the right thing to do. > > Yeh that's probably true, although there's a check for 'buffer_is_zero' > in the !rdma->pin_all case, if the entire area is zero (or probably if > unmapped) then it sends a notification rather than registering; see > qemu_rdma_write_one and search for 'This chunk has not yet been > registered, so first check to see' Right, if the whole chunk is zero, it will send a "compressed" zero chunk to the target. That will result in a memset() in case the destination is not already zero. So, both the source and the destination will be at least be read. But this only works if a complete chunk (1MB) is zero IIUC. If only one page within a chunk is not zero (e.g., not inflated), the whole chunk will be pinned. Also, "disabled chunk registration" seems to be another case. https://wiki.qemu.org/Features/RDMALiveMigration "Finally, zero pages are only checked if a page has not yet been registered using chunk registration (or not checked at all and unconditionally written if chunk registration is disabled. This is accomplished using the "Compress" command listed above. If the page *has* been registered then we check the entire chunk for zero. Only if the entire chunk is zero, then we send a compress command to zap the page on the other side." -- Thanks, David / dhildenb