On Thu, Feb 20, 2025 at 04:45:54PM +0800, Junxian Huang wrote: > > > On 2025/2/20 15:32, Leon Romanovsky wrote: > > On Thu, Feb 20, 2025 at 11:48:49AM +0800, Junxian Huang wrote: > >> > >> > >> On 2025/2/19 22:35, Leon Romanovsky wrote: > >>> On Wed, Feb 19, 2025 at 09:07:36PM +0800, Junxian Huang wrote: > >>>> > >>>> > >>>> On 2025/2/19 20:14, Leon Romanovsky wrote: > >>>>> On Mon, Feb 17, 2025 at 03:01:19PM +0800, Junxian Huang wrote: > >>>>>> When mailboxes for resource(QP/CQ/SRQ) destruction fail, it's unable > >>>>>> to notify HW about the destruction. In this case, driver will still > >>>>>> free the resources, while HW may still access them, thus leading to > >>>>>> a UAF. > >>>>> > >>>>>> This series introduces delay-destruction mechanism to fix such HW UAF, > >>>>>> including thw HW CTX and doorbells. > >>>>> > >>>>> And why can't you fix FW instead? > >>>>> > >>>> > >>>> The key is the failure of mailbox, and there are some cases that would > >>>> lead to it, which we don't really consider as FW bugs. > >>>> > >>>> For example, when some random fatal error like RAS error occurs in FW, > >>>> our FW will be reset. Driver's mailbox will fail during the FW reset. > >>> > >>> I don't understand this scenario. You said at the beginning that HW can > >>> access host memory and this triggers UAF. However now, you are presenting > >>> case where driver tries to access mailbox. > >>> > >> > >> No, I'm saying that mailbox errors are the reason of HW UAF. Let me > >> explain this scenario in more detail. > >> > >> Driver notifies HW about the memory release with mailbox. The procedure > >> of a mailbox is: > >> a) driver posts the mailbox to FW > >> b) FW writes the mailbox data into HW > >> > >> In this scenario, step a) will fail due to the FW reset, HW won't get > >> notified and thus may lead to UAF. > > > > Exactly, FW performed reset and didn't prevent from HW to access it. > > > > Yes, but the problem is that our HW doesn't provide a method to prevent > the access. There's nothing FW can do in this scenario, so we can only > prevent UAF by adding these codes in driver. Somehow HW doesn't access mailbox if destroy was successful, so why can't FW use same "method" to inform HW before reset? Thanks