On Wed, 9 Aug 2023 08:44:44 -0700 Brett Creeley <bcreeley@xxxxxxx> wrote: > On 8/8/2023 3:27 PM, Alex Williamson wrote: > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > > > > On Mon, 7 Aug 2023 13:57:53 -0700 > > Brett Creeley <brett.creeley@xxxxxxx> wrote: > > ... > >> +static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio, > >> + struct rb_root_cached *ranges, u32 nnodes, > >> + u64 *page_size) > >> +{ > >> + struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; > >> + struct device *pdsc_dev = &pci_physfn(pdev)->dev; > >> + struct pds_vfio_dirty *dirty = &pds_vfio->dirty; > >> + u64 region_start, region_size, region_page_size; > >> + struct pds_lm_dirty_region_info *region_info; > >> + struct interval_tree_node *node = NULL; > >> + u8 max_regions = 0, num_regions; > >> + dma_addr_t regions_dma = 0; > >> + u32 num_ranges = nnodes; > >> + u32 page_count; > >> + u16 len; > >> + int err; > >> + > >> + dev_dbg(&pdev->dev, "vf%u: Start dirty page tracking\n", > >> + pds_vfio->vf_id); > >> + > >> + if (pds_vfio_dirty_is_enabled(pds_vfio)) > >> + return -EINVAL; > >> + > >> + /* find if dirty tracking is disabled, i.e. num_regions == 0 */ > >> + err = pds_vfio_dirty_status_cmd(pds_vfio, 0, &max_regions, > >> + &num_regions); > >> + if (err < 0) { > >> + dev_err(&pdev->dev, "Failed to get dirty status, err %pe\n", > >> + ERR_PTR(err)); > >> + return err; > >> + } else if (num_regions) { > >> + dev_err(&pdev->dev, > >> + "Dirty tracking already enabled for %d regions\n", > >> + num_regions); > >> + return -EEXIST; > >> + } else if (!max_regions) { > >> + dev_err(&pdev->dev, > >> + "Device doesn't support dirty tracking, max_regions %d\n", > >> + max_regions); > >> + return -EOPNOTSUPP; > >> + } > >> + > >> + /* > >> + * Only support 1 region for now. If there are any large gaps in the > >> + * VM's address regions, then this would be a waste of memory as we are > >> + * generating 2 bitmaps (ack/seq) from the min address to the max > >> + * address of the VM's address regions. In the future, if we support > >> + * more than one region in the device/driver we can split the bitmaps > >> + * on the largest address region gaps. We can do this split up to the > >> + * max_regions times returned from the dirty_status command. > >> + */ > > > > Isn't this a pretty unfortunately limitation given QEMU makes a 1TB > > hole on AMD hosts? Or maybe I misunderstand. > > > > https://gitlab.com/qemu-project/qemu/-/commit/8504f129450b909c88e199ca44facd35d38ba4de > > > > Thanks, > > Alex > > > > Yes, this is currently an unfortunate limitation. However, our device is > flexible enough to support >1 regions. There has been some work in this > area, but we aren't quite there yet. The goal was to get this initial > support accepted and submit follow on work to support >1 regions. Ok, good that this is temporary. Shameer, Kevin, Jason, Yishai, I'm hoping one or more of you can approve this series as well. Thanks, Alex