Re: [PATCH] virtio: vdpa: new SolidNET DPU driver.

Jason Wang <jasowang@xxxxxxxxxx> · Thu, 10 Nov 2022 09:54:55 +0800

在 2022/11/9 17:44, Alvaro Karsz 写道:
Hi Jason,

If I was not wrong, this value seems doesn't change. If yes, I wonder we
can move the kick_offset logic to snet_alloc_dev()?

There is not a real reason to have this logic in snet_set_vq_address,
so it could be moved to snet_build_vqs (which is called within
snet_alloc_dev).

-EOPNOTSUPP?

Returning an error from the set_vq_state callback leads to probe failure.
This code is taken from drivers/virtio/virtio_vdpa.c,
virtio_vdpa_setup_vq function:

  struct vdpa_vq_state state = {0};

......

err = ops->set_vq_state(vdpa, index, &state);
if (err)
         goto err_vq;

I could check struct vdpa_vq_state, and return 0 if struct
vdpa_vq_state value is 0, -EOPNOTSUPP otherwise.
What do you think?

That would be fine. (It could be something similar to 
vp_vdpa_set_vq_state_split()).

I fail to understand how can this work. E.g after reset there should be
no interaction from the device like DMA, otherwise there could have
security implications.

You're right, I'll add a proper reset callback.

Since read is ordered with write, a more easy way is to perform a ioread
here.
Interesting, which barrier is this paired?

Usually reads are slow, but we don't care about speed when sending
messages (since we only send a "destroy" message so far, meaning that
the pci remove callback was called), so the memory barrier can be
replaced with a read operation.

Yes, actually, the memory barrier can't guarantee the write has been 
processed by the device.

Do we need to do endian conversion here (cpu_to_le64())?

Yes, I'll add it.

Need to take endianess into account.

I'm not sure about that.
The endianness appears to be handled by the caller.
Function from include/linux/virtio_config.h

static inline void virtio_cwrite16(struct virtio_device *vdev,
   unsigned int offset, u16 val)
{
     __virtio16 v;

     might_sleep();
     v = cpu_to_virtio16(vdev, val);
     vdev->config->set(vdev, offset, &v, sizeof(v));
}

You are right.

static inline void virtio_cwrite32(struct virtio_device *vdev,
   unsigned int offset, u32 val)
{
     __virtio32 v;

     might_sleep();
     v = cpu_to_virtio32(vdev, val);
     vdev->config->set(vdev, offset, &v, sizeof(v));
}

static inline void virtio_cwrite64(struct virtio_device *vdev,
   unsigned int offset, u64 val)
{
     __virtio64 v;

     might_sleep();
     v = cpu_to_virtio64(vdev, val);
     vdev->config->set(vdev, offset, &v, sizeof(v));
}

I'm not sure how the endianness can be handled by the vDPA driver.
This function may be called for a 8bit, 16bit, 32bit or 64bit variables.
It theoretically may be called to change multiple variables at once.
It may be called to change part of a variable.

  If I was not wrong, the device depends on the platform IOMMU to work. So
unless device have a more strict iova range than what platform IOMMU can
provide, we can simply not advertise this and use the one that is
provided by the IOMMU:

static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v)
{
          struct vdpa_iova_range *range = &v->range;
          struct vdpa_device *vdpa = v->vdpa;
          const struct vdpa_config_ops *ops = vdpa->config;

          if (ops->get_iova_range) {
                  *range = ops->get_iova_range(vdpa);
          } else if (v->domain && v->domain->geometry.force_aperture) {
                  range->first = v->domain->geometry.aperture_start;
                  range->last = v->domain->geometry.aperture_end;
          } else {
                  range->first = 0;
                  range->last = ULLONG_MAX;
          }
}

I'll delete the snet_get_iova_range function.

Any chance to use device managed region helper here? It seems to
simplify the codes (e.g the cleanup stuffs).

Ok, I'll do it.

Is this better to say "config is not ready"? Btw, I wonder if it makes
more sense to wait until the confg is ready with a timeout?

Good idea, I'll implement the wait & timeout.

I wonder if it's worth to bother consider we're using devres to manage irqs.

You're right, this isn't required, I'll remove it.

It looks to me all the devices created here use the same dma_dev (the
PCI device), I wonder how the DMA is isolated among the vDPA devices
created here.

All vDPA devices indeed use the same DMA device, there is no isolation
between the devices.
I'm not sure why there should be isolation in this case.

Each vDPA needs to be able to assigned to a userspace(VM) independently 
when bound to vhost-vDPA. If they share the same IOMMU domain, there 
will be security issues.

Btw, the vDPA has been switched to use vDPA tool to create devices, it
is suggested to implement the mgmt devices as what other parents did.
Then the snet_alloc_dev() could be used for dev_add().

We want to leave control to the DPU at the moment, the number/type of
devices is determined by the DPU's config, and can't be controlled
from userspace.

That's fine, it might be better to state this in the changelog or as a 
comment somewhere in the code.

There looks like a race, the vDPA device could be registered to the bus
and used by userspace by bus master/drvdata is set.

You're right, the vdpa registration should be done after the
master/drvdata is set.

I think devres should take care of this since we're using
pcim_enable_device()?

You're right, this isn't required, I'll remove it.

According to the code, this seems to be the driver features and

These are the negotiated features
We're not keeping record of the driver features, when
set_driver_features is called, we just logic AND the driver features
with the supported features received from the DPU.
I'll rename it to be 'negotiated_features", this seems more accurate.

Ok.

static int snet_set_drv_features(struct vdpa_device *vdev, u64 features)
{
     struct snet *snet = vdpa_to_snet(vdev);

     snet->used_features = snet->cfg->features & features;
     return 0;
}

This seems to be unused.

You're right, I'll remove it.

Thank you for your comments.
I'll send a new version once I finish working on the comments.

Alvaro

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization