Hi On Mon, Sep 17, 2018 at 3:07 PM, Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: > On 09/17/2018 11:30 AM, Marc-André Lureau wrote: >> Hi >> >> On Fri, Sep 14, 2018 at 11:44 AM, Michal Prívozník <mprivozn@xxxxxxxxxx> wrote: >>> On 09/13/2018 11:51 PM, John Ferlan wrote: >>>> >>>> >>>> On 09/13/2018 10:09 AM, John Ferlan wrote: >>>>> >>>>> >>>>> On 09/13/2018 03:39 AM, Marc-André Lureau wrote: >>>>>> Hi >>>>>> >>>>>> On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>>> >>>>>>>>> So all that's "left": >>>>>>>>> >>>>>>>>> 1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not >>>>>>>>> changing from memory-backend-ram to memory-backend-memfd. We already >>>>>>>>> check that "(src->mem.source != dst->mem.source)" - so we know we're >>>>>>>>> already anonymous or not. >>>>>>>>> >>>>>>>>> Any suggestions? If source is anonymous, then what? I think we can use >>>>>>>>> the qemuDomainObjPrivatePtr in some way to determine that we were >>>>>>>>> started with -memfd (or not started that way). >>>>>>>> >>>>>>>> No idea how we could save that information across various restarts / >>>>>>>> version changes. >>>>>>> >>>>>>> I think it'd be ugly... I think migration cookies would have to be >>>>>>> used... I considered other mechanisms, but each wouldn't quite work. >>>>>>> Without writing the code, if we cared to do this, then we'd have: >>>>>>> >>>>>>> 1. Add a field to qemuDomainObjPrivatePtr that indicates what got >>>>>>> started (none, memfd, file, or ram). Add a typedef enum that has >>>>>>> unknown, none, memfd, file, and ram. Add the Parse/Format code to handle >>>>>>> the field. >>>>>>> >>>>>>> 2. Modify the qemu_command code to set the field in priv based on what >>>>>>> got started, if something got started. The value would be > 0... >>>>>>> >>>>>>> 3. Mess with the migration cookie logic to add checks for what the >>>>>>> source started. On the destination side of that cookie if we had the >>>>>>> "right capabilities", then check the source cookie to see what it has. >>>>>>> If it didn't have that field, then I think one could assume the source >>>>>>> with anonymous memory backing would be using -ram. We'd already fail the >>>>>>> src/dst mem.source check if one used -file. I'm not all the versed in >>>>>>> the cookies, but I think that'd work "logically thinking" at least. The >>>>>>> devil would be in the details. >>>>>>> >>>>>>> Assuming your 3.1 patches do something to handle the condition, I guess >>>>>>> it comes does to how much of a problem it's believed this could be in >>>>>>> 2.12 and 3.0 if someone is running -ram and migrates to a host that >>>>>>> would default to -memfd. >>>>>> >>>>>> I am afraid we will need to do it to handle transparent -memfd usage. >>>>>> I'll look at it with your help. >>>>>> >>>>> >>>>> Let's see what I can cobble together. I'll repost the series a bit later >>>>> today hopefully. >>>>> >>>> >>>> After spending a few hours on this, the cookies just don't help enough >>>> or I don't know/understand enough about their usage. >>>> >>>> I keep coming back to the problem of how do we disallow a migration from >>>> a host that has/knows about and uses anonymous memfd to one that doesn't >>>> know about it. Similarly, if a domain source w/ "file" or "ram" (whether >>>> at startup time or via hotplug) is migrated to a target host that would >>>> generate memfd - we have no mechanism to stop the migration because we >>>> have no way to tell what it was running, especially since what gets >>>> started isn't just based off the source type - hugepages have a >>>> tangential role. Lots of logic stuffed into qemu_command that probably >>>> should have been in some qemuDomainPrepareMemtune API. >>>> >>>> So unfortunately, I think the only safe way is to create a new source >>>> type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly >>>> as the other entries are described (ironically the document default of >>>> "anonymous" could be "file" or it could be "ram" based 3 other factors >>>> not described in the docs). At least with a new type name/value we can >>>> guarantee that someone selects it by name rather than the multipurpose >>>> "anonymous" type. I think it would mean moving the caps checks to a bit >>>> later in the code, search for "otherwise check the required capability". >>>> >>>> Unless someone still brave enough to keep reading this stream has an >>>> idea to try. I'm tapped out! >>> >>> We can have an element/attribute in status XML/migration XML saying >>> which backend we've used. This is slightly tricky because we have more >>> places then one where users can tune confuguration such that we use >>> different backends. My personal favorite is: >>> >>> <memoryBacking> >>> <hugepages> >>> <page size='2048' unit='KiB' nodeset='1'/> >>> </hugepages> >>> </memoryBacking> >>> >>> <cpu> >>> <numa> >>> <cell id='0' cpus='0' memory='1048576' unit='KiB'/> >>> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared'/> >>> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private'/> >>> <cell id='3' cpus='3' memory='1048576' unit='KiB'/> >>> </numa> >>> </cpu> >>> >>> <devices> >>> <memory model='dimm'> >>> <target> >>> <size unit='KiB'>524288</size> >>> <node>1</node> >>> </target> >>> <address type='dimm' slot='0' base='0x100000000'/> >>> </memory> >>> </devices> >>> >>> >>> So what we can have is: >>> >>> <hugepages> >>> <page size=.... backend='memory-backend-file'/> >>> </hugepages> >>> >>> <cell id='0' cpus='0' memory='1048576' unit='KiB' backend='memory-backend-ram'/> >>> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared' backend='memory-backend-file'/> >>> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private' backend='memory-backend-file/> >>> <cell id='3' cpus='3' memory='1048576' unit='KiB' backend='memory-backend-ram'/> >>> >>> <devices> >>> <memory model='dimm' backend='memory-backend-ram'/> >> >> That's a bit overkill to me, since we don't have (yet) the capacity >> for a user to select the memory backend, and the value is a >> qemu-specific detail. > > So status XML is not something we parse from user. It's produced by > libvirt and it's a superset of user provided XML and some runtime > information. For instance, look around the lines where > VIR_DOMAIN_DEF_PARSE_STATUS flag occurs. > >> >>> .. >>> </devices> >>> >>> >>> This way we know what backend was used on the source (in saved state) >>> and the only thing we need to know on dst (on restore) is to check if >>> given backend is available. >>> >>> I don't think putting anything in migration cookies is going to help. >>> It might help migration if anything but it will definitely keep >>> save/restore broken as there are no migration cookies. >> >> Ah, too bad. I am not familar enough with migration and save/restore >> in libvirt. But I started to imagine how the migration cookie could >> have been used. > > From qemu POV, there's no difference between migration and save/restore. > All of them is a migration except save/restore is migration to/from a > file (FD actually). > >> >> Is there only in the domain XML we can save information? > > Yes, status XML. That's where libvirt keeps its runtime information (and > which backend was used falls exactly into this category) so that it is > preserved on the daemon restart. > >> >> If yes, then either we go with your proposal (although I wonder if it >> should be qemu: namespaced) or can we introduce libvirt capabilites? >> (something as simple as >> <capabilities><qemu-memorybackend-memfd</capabilities>) ? > > No need. Once again, this is not something that users will ever see, nor > libvirt would parse it when parsing input from user. > > The same applies for migration XML. These two are different in some > aspects, but that is not critical for this feature. It's sufficient to > say for now that status XML preserves runtime data between daemon > restarts (we want freshly restarted libvirt to remember what backend was > used) and migration XML preserves runtime data on migration (we want the > destination to know what backend is used). Ok Wouldn't it be easier to have <source type="memfd"/> Daniel didn't have a strong objection against it, it was more of a suggestion for "anonymous" type improvement: https://www.redhat.com/archives/libvir-list/2018-August/msg01841.html Eventually, "anonymous" could be smartly changed to "memfd" by libvirt when possible (from a non-resume start) thanks -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list