On 2025/2/20 15:32, Leon Romanovsky wrote: > On Thu, Feb 20, 2025 at 11:48:49AM +0800, Junxian Huang wrote: >> >> >> On 2025/2/19 22:35, Leon Romanovsky wrote: >>> On Wed, Feb 19, 2025 at 09:07:36PM +0800, Junxian Huang wrote: >>>> >>>> >>>> On 2025/2/19 20:14, Leon Romanovsky wrote: >>>>> On Mon, Feb 17, 2025 at 03:01:19PM +0800, Junxian Huang wrote: >>>>>> When mailboxes for resource(QP/CQ/SRQ) destruction fail, it's unable >>>>>> to notify HW about the destruction. In this case, driver will still >>>>>> free the resources, while HW may still access them, thus leading to >>>>>> a UAF. >>>>> >>>>>> This series introduces delay-destruction mechanism to fix such HW UAF, >>>>>> including thw HW CTX and doorbells. >>>>> >>>>> And why can't you fix FW instead? >>>>> >>>> >>>> The key is the failure of mailbox, and there are some cases that would >>>> lead to it, which we don't really consider as FW bugs. >>>> >>>> For example, when some random fatal error like RAS error occurs in FW, >>>> our FW will be reset. Driver's mailbox will fail during the FW reset. >>> >>> I don't understand this scenario. You said at the beginning that HW can >>> access host memory and this triggers UAF. However now, you are presenting >>> case where driver tries to access mailbox. >>> >> >> No, I'm saying that mailbox errors are the reason of HW UAF. Let me >> explain this scenario in more detail. >> >> Driver notifies HW about the memory release with mailbox. The procedure >> of a mailbox is: >> a) driver posts the mailbox to FW >> b) FW writes the mailbox data into HW >> >> In this scenario, step a) will fail due to the FW reset, HW won't get >> notified and thus may lead to UAF. > > Exactly, FW performed reset and didn't prevent from HW to access it. > Yes, but the problem is that our HW doesn't provide a method to prevent the access. There's nothing FW can do in this scenario, so we can only prevent UAF by adding these codes in driver. Thanks, Junxian > Thanks > >> >> Junxian >> >>>> >>>> Another case is the mailbox timeout when FW is under heavy load, as it is >>>> shared by multi-functions. >>> >>> It is not different from any other mailbox errors. FW needs to handle >>> these cases. >>> >>> Thanks >>> >>>> >>>> Thanks, >>>> Junxian >>>> >>>>> Thanks