On 9/2/24 3:21 PM, Leon Romanovsky wrote: > On Fri, Aug 30, 2024 at 10:34:42AM +0800, Cheng Xu wrote: >> >> >> On 8/29/24 6:09 PM, Leon Romanovsky wrote: >>> On Wed, Aug 28, 2024 at 02:09:41PM +0800, Cheng Xu wrote: >>>> Driver may probe again while hardware is destroying the internal >>>> resources allocated for previous probing >>> >>> How is it possible? >>> >> >> The resources I mentioned is totally unseen to driver, it's something related >> to our device management part in hypervisor, so it won't cause host resources >> leak, and the cleanup/reset process may take a long time. For these reason, >> we don't wait the completion of the cleanup/reset in the remove routing. >> Instead, the driver will wait the device status become ready in probe routing >> (In most cases, the hardware will have enough time to finish the cleanup/reset >> before the second probe), so that we can boost the remove process. > > And why don't hypervisor wait for the device to be ready before giving it to VM? Hypervisor actually does what you described during the first bootup. However, one scenario is that the erdma driver is unloaded and loaded quickly while the device always exists in the VM. In this case, there is no opportunity for the hypervisor to perform that action. > Why do you need to complicate the probe routine to overcome the hypervisor behavior? > The hardware now requires that the former reset (issued in the remove routine) must be completed before device init (issued in the probe routine). Waiting the reset completed either in the remove routine or in the probe routine both can meet the requirement. This patch chose to wait in the probe routine because it can speed up the remove process. Actually this is a good question, and inspires me that maybe the requirement in the hardware/backend may be eliminated, so that simplify the driver process. I'd like to remove this patch in v3 and leave it for internal discussion. Thanks very much Cheng Xu > Thanks > >> >> Thanks, >> Cheng Xu >>