On 7/6/22 06:03, Jason Wang wrote: > On Mon, Jul 4, 2022 at 5:45 PM Arnaud POULIQUEN > <arnaud.pouliquen@xxxxxxxxxxx> wrote: >> >> Hello Jason, >> >> On 7/4/22 06:35, Jason Wang wrote: >>> On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: >>>> >>>> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote: >>>>> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: >>>>>> >>>>>> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote: >>>>>>> + virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx >>>>>>> + jasowang@xxxxxxxxxx >>>>>>> + mst@xxxxxxxxxx >>>>>>> >>>>>>> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN >>>>>>> <arnaud.pouliquen@xxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 6/29/22 19:43, Mathieu Poirier wrote: >>>>>>>>> Hi Anup, >>>>>>>>> >>>>>>>>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote: >>>>>>>>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf() >>>>>>>>>> fails due to both virtqueues (Rx and Tx) marked as broken by the >>>>>>>>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready() >>>>>>>>>> (which unbreaks queues) should be called before virtqueue_add_inbuf(). >>>>>>>>>> >>>>>>>>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ") >>>>>>>>>> Signed-off-by: Anup Patel <apatel@xxxxxxxxxxxxxxxx> >>>>>>>>>> --- >>>>>>>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++--- >>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c >>>>>>>>>> index 905ac7910c98..71a64d2c7644 100644 >>>>>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c >>>>>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c >>>>>>>>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev) >>>>>>>>>> /* and half is dedicated for TX */ >>>>>>>>>> vrp->sbufs = bufs_va + total_buf_space / 2; >>>>>>>>>> >>>>>>>>>> + /* From this point on, we can notify and get callbacks. */ >>>>>>>>>> + virtio_device_ready(vdev); >>>>>>>>>> + >>>>>>>>> >>>>>>>>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can >>>>>>>>> potentially be called (by way of rpmsg_recv_done()), which will race with >>>>>>>>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then >>>>>>>>> rpmsg_recv_done() will fail, potentially breaking remote processors' state >>>>>>>>> machines that don't expect their initial name service to fail when the "device" >>>>>>>>> has been marked as ready. >>>>>>>>> >>>>>>>>> What does make me curious though is that nobody on the remoteproc mailing list >>>>>>>>> has complained about commit 8b4ec69d7e09 breaking their environment... By now, >>>>>>>>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on >>>>>>>>> their rig? >>>>>>>> >>>>>>>> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4) >>>>>>>> I confirm the issue! >>>>>>>> >>>>>>>> Concerning the solution, I share Mathieu's concern. This could break legacy. >>>>>>>> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence. >>>>>>>> >>>>>>>> I this case the patch would be: >>>>>>>> >>>>>>>> + /* >>>>>>>> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status >>>>>>>> + * to ready >>>>>>>> + */ >>>>>>>> + __virtio_unbreak_device(vdev); >>>>>>>> + >>>>>>>> >>>>>>>> /* set up the receive buffers */ >>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) { >>>>>>>> struct scatterlist sg; >>>>>>>> void *cpu_addr = vrp->rbufs + i * vrp->buf_size; >>>>>>> >>>>>>> This will indeed fix the problem. On the flip side the kernel >>>>>>> documentation for __virtio_unbreak_device() puzzles me... >>>>>>> It clearly states that it should be used for probing and restoring but >>>>>>> _not_ directly by the driver. Function rpmsg_probe() is part of >>>>>>> probing but also the entry point to a driver. >>>>>>> >>>>>>> Michael and virtualisation folks, is this the right way to move forward? >>>>>> >>>>>> I don't think it is, __virtio_unbreak_device is intended for core use. >>>>> >>>>> Can we fill the rx after virtio_device_ready() in this case? >>>>> >>>>> Btw, the driver set driver ok after registering, we probably get a svq >>>>> kick before DRIVER_OK? >> >> By "registering" you mean calling rpmsg_virtio_add_ctrl_dev and >> rpmsg_ns_register_device? > > Yes. > >> >> The rpmsg_ns_register_device has to be called before. Because it has to be >> probed to handle the first message coming from the remote side to create >> associated rpmsg local device. > > I couldn't find the code to do this, maybe you can give me some hint on this. The rpmsg_ns is available here : https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/rpmsg_ns.c It is probed on rpmsg_ns_register_device call. https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/virtio_rpmsg_bus.c#L974 > >> It doesn't send message. > > I see the function register the device to the bus, I wonder if this > means the device could be probed and used by the driver before > virtio_device_ready(). > >> >> The risk could be for the rpmsg_ctrl device. Registering it >> after the virtio_device_ready(vdev) call could make sense... > > I see. > >> >>>>> >>>>> Thanks >>>> >>>> Is this an ack for the original patch? >>> >>> Nope, I meant, instead of moving virtio_device_ready() a little bit >>> earlier, can we only move the rvq filling after virtio_device_ready(). >>> >>> Thanks >> >> Please find some concerns about this inversion here: >> https://lore.kernel.org/lkml/20220701053813-mutt-send-email-mst@xxxxxxxxxx/ >> >> Regarding __virtio_unbreak_device. The pending virtio_break_device is >> used by some virtio driver. >> Could we consider that it makes sense to also have a >> virtio_unbreak_device interface? > > We don't want to allow the driver to unbreak a device since it's > easier to have bugs. > >> >> >> I do not well understand the reason of the commit: >> 8b4ec69d7e09 ("virtio: harden vring IRQ", 2022-05-27) > > It tries to forbid the virtqueue callbacks to be called before > virtio_device_ready(). This helps to prevent the malicious device from > attacking the driver. > > But unfortunately, it breaks several driver because: > > 1) some driver have races in probe/remove > 2) it tries to reuse vq->broken which may break the driver that call > virqueue_add() before virtio_device_ready() which is allowed by the > spec > > There's a discussion to have a better behavior that doesn't break the > existing drivers. And the IRQ hardening feature is marked as broken > now, so rpmsg should be fine without any extra effort. Thanks for the explanations. If the discussions are in a mail thread could you give me the reference? Thanks, Arnaud > >> So following alternative is probably pretty naive: >> Is the use of virtqueue_disable_cb could be an alternative to the >> vq->broken usage allowing to register buffer while preventing virtqueue IRQ? > > Probably not, there's no guarantee that the device will not send > notification after virqtueue_disable_cb(). > > Thanks > >> >> Thanks, >> Arnaud >> >>> >>>> >>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Arnaud >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mathieu >>>>>>>>> >>>>>>>>>> /* set up the receive buffers */ >>>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) { >>>>>>>>>> struct scatterlist sg; >>>>>>>>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev) >>>>>>>>>> */ >>>>>>>>>> notify = virtqueue_kick_prepare(vrp->rvq); >>>>>>>>>> >>>>>>>>>> - /* From this point on, we can notify and get callbacks. */ >>>>>>>>>> - virtio_device_ready(vdev); >>>>>>>>>> - >>>>>>>>>> /* tell the remote processor it can start sending messages */ >>>>>>>>>> /* >>>>>>>>>> * this might be concurrent with callbacks, but we are only >>>>>>>>>> -- >>>>>>>>>> 2.34.1 >>>>>>>>>> >>>>>> >>>> >>> >> >