On 2019/4/23 23:23, Leon Romanovsky wrote: > On Mon, Apr 22, 2019 at 09:38:37PM +0800, oulijun wrote: >> ??? 2019/4/22 20:22, Leon Romanovsky ??????: >>> On Fri, Apr 19, 2019 at 03:46:32PM +0800, oulijun wrote: >>>> ??? 2019/4/16 20:16, Leon Romanovsky ??????: >>>>> On Sat, Apr 13, 2019 at 07:29:01PM +0800, Lijun Ou wrote: >>>>>> To avoid resource unreleased while ULP aborted abnormally, >>>>>> the hardware adds the capability of restoring the resource >>>>>> while removing module, this patch enables this capability. >>>>> Can anyone help me to understand what does it mean? >>>>> How can ULP "abort" without releasing resources? >>>>> >>>>> Thanks >>>> Maybe the commit description is not correct enough. >>>> >>>> The entire PATCH is to solve the following scenarios. When a function is abnormal, the hardware >>>> >>>> need to release the relatived hardware reource and the entire release process is the same as the flr process. >>>> >>>> It uses the firmware to reslove. The hw design adds a firmware cmd to clear the hardware state and judge >>>> >>>> the resource of hardware have freed. >>>> >>>> As a result, the driver need to implement this cmd. >>> You explained what you are doing, but not why are you doing. >> Hi, Leon >> if carried out unload operation When rdma app running, the hardware is too late to release and remain in hardware. > > It is responsibility of disassociate flow to clean such mess and various > unwind flows. > >> Under these circumstances, it maybe happen error if loaded hns driver and run app again. In order to reslove it, >> the hardware adds a function clear function to stop this function and clear the residual hardware resources in the function. > > First, your initialization flow should do it always, second you need to > find the root cause of resource leakage in case of application was aborted. > Hi Leon, Thanks very much for your valuable doubt that make us to think about this functionality more deeply. Sorry that our description and response leads the disccussion to focus on the assumption that the application may abort abnormally and cann't notify the hardware to release previously requested resource anymore, and thus these resources remain in the hardware. Actually, current OFED driver frame has already constructed a mechanism to destroy these objs during rmmod ko for the cases application is still running or aborted abnormally. In another word, our assumption is wrong. There is no need to warry about the leakage of resource which applied by application. However, I have talked with our chip team about function clear functionality. We think it is necessary to inform the chip to perform the outstanding task and some cleanup work and restore hardware resources in time when rmmod ko. Otherwise, it is dangerous to reuse the hardware as it can not guarantee those work can be done well without the notification from our driver. Therefore, function clear functionality in this patch can make sure our hardware work properly. Thanks.