[...] >> Sounds like the device is unused :D >> >> "Device info for reporting unused pages" ? >> >> I am in general wondering, should we rename "unused" to "free". I.e., >> "free page reporting" instead of "unused page reporting"? Or what was >> the motivation behind using "unused" ? > > I honestly don't remember why I chose "unused" at this point. I can > switch over to "free" if that is what is preferred. > > Looking over the code a bit more I suspect the reason for avoiding it > is because free page hinting also mentioned reporting in a few spots. Maybe we should fix these cases. FWIW, I'd prefer "free page reporting". (e.g., pairs nicely with "free page hinting"). >>> + >>> + virtqueue_kick(vq); >>> + >>> + /* When host has read buffer, this completes via balloon_ack */ >>> + wait_event(vb->acked, virtqueue_get_buf(vq, &unused)); >> >> Is it safe to rely on the same ack-ing mechanism as the inflate/deflate >> queue? What if both mechanisms are used concurrently and race/both wait >> for the hypervisor? >> >> Maybe we need a separate vb->acked + callback function. > > So if I understand correctly what is actually happening is that the > wait event is simply a trigger that will wake us up, and at that point > we check to see if the buffer we submitted is done. If not we go back > to sleep. As such all we are really waiting on is the notification > that the buffers we submitted have been processed. So it is using the > same function but on a different virtual queue. Very right, this is just a waitqueue (was only looking at this patch, not the full code). This should indeed be fine. >>> vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE]; >>> vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE]; >>> if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { >>> @@ -932,12 +976,30 @@ static int virtballoon_probe(struct virtio_device *vdev) >>> if (err) >>> goto out_del_balloon_wq; >>> } >>> + >>> + vb->pr_dev_info.report = virtballoon_unused_page_report; >>> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) { >>> + unsigned int capacity; >>> + >>> + capacity = min_t(unsigned int, >>> + virtqueue_get_vring_size(vb->reporting_vq), >>> + VIRTIO_BALLOON_VRING_HINTS_MAX); >>> + vb->pr_dev_info.capacity = capacity; >>> + >>> + err = page_reporting_register(&vb->pr_dev_info); >>> + if (err) >>> + goto out_unregister_shrinker; >>> + } >> >> It can happen here that we start reporting before marking the device >> ready. Can that be problematic? >> >> Maybe we have to ignore any reports in virtballoon_unused_page_report() >> until ready... > > I don't think there is an issue with us putting buffers on the ring > before it is ready. I think it will just cause our function to sleep. > > I'm guessing that is the case since init_vqs will add a buffer to the > stats vq and that happens even earlier in virtballoon_probe. > Interesting: "Note: vqs are enabled automatically after probe returns.". Learned something new. The virtballoon_changed(vdev) *after* virtio_device_ready(vdev) made me wonder, because that could also fill the queues. Maybe Michael can clarify. >>> + >>> virtio_device_ready(vdev); >>> >>> if (towards_target(vb)) >>> virtballoon_changed(vdev); >>> return 0; >>> >>> +out_unregister_shrinker: >>> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) >>> + virtio_balloon_unregister_shrinker(vb); >> >> A sync is done implicitly, right? So after this call, we won't get any >> new callbacks/are stuck in a callback. > > From what I can tell a read/write semaphore is used in > unregister_shrinker when we delete it from the list so it shouldn't be > an issue. Yes, makes sense. > >>> out_del_balloon_wq: >>> if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) >>> destroy_workqueue(vb->balloon_wq); >>> @@ -966,6 +1028,8 @@ static void virtballoon_remove(struct virtio_device *vdev) >>> { >>> struct virtio_balloon *vb = vdev->priv; >>> >>> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) >>> + page_reporting_unregister(&vb->pr_dev_info); >> >> Dito, same question regarding syncs. > > Yes, although for that one I was using pointer deletion, a barrier, > and a cancel_work_sync since I didn't support a list. Okay, perfect. [...] >> >> Small and powerful patch :) > > Agreed. Although we will have to see if we can keep it that way. > Ideally I want to leave this with the ability so specify what size > scatterlist we receive. However if we have to flip it around then it > will force us to add logic for chopping up the scatterlist for > processing in chunks. I hope we can keep it like that. Otherwise each and every driver has to implement this chopping-up (e.g., a hypervisor that can only send one hint at a time - e.g., via a simple hypercall - would have to implement that). -- Thanks, David / dhildenb