Re: [PATCH v5] PCI: dwc: Wait for link up only if link is started

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 28, 2024 at 10:59:51PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Feb 20, 2024 at 11:04:09PM +0530, Ajay Agarwal wrote:
> > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote:
> > > + Joao
> > > 
> > > On Mon, Feb 05, 2024 at 04:30:20PM +0530, Ajay Agarwal wrote:
> > > > On Wed, Jan 31, 2024 at 12:06:26AM +0530, Manivannan Sadhasivam wrote:
> > > > > On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote:
> > > > > > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote:
> > > > > > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote:
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training?
> > > > > > > > > > > > > 
> > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger.
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during
> > > > > > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link
> > > > > > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance
> > > > > > > > > > > get reused in other platform (non-android), the it won't be a seamless user
> > > > > > > > > > > experience.
> > > > > > > > > > > 
> > > > > > > > > > > If there are any other usecases, please state them.
> > > > > > > > > > > 
> > > > > > > > > > > - Mani
> > > > > > > > > > >
> > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints
> > > > > > > > > > being enumerated need to go through an appropriate and device specific
> > > > > > > > > > power sequence which gets triggered only when the userspace is up. The
> > > > > > > > > > PCIe probe cannot assume that the EPs have been powered up already and
> > > > > > > > > > hence the link-up is not attempted.
> > > > > > > > > 
> > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you
> > > > > > > > I am not adding any logic/condition around calling the start_link()
> > > > > > > > itself. I am only avoiding the wait for the link to be up if the
> > > > > > > > controller driver has not defined start_link().
> > > > > > > > 
> > > > > > > 
> > > > > > > I'm saying that not defining the start_link() callback itself is wrong.
> > > > > > > 
> > > > > > Whether the start_link() should be defined or not, is a different
> > > > > > design discussion. We currently have 2 drivers in upstream (intel-gw and
> > > > > > dw-plat) which do not have start_link() defined. Waiting for the link to
> > > > > > come up for the platforms using those drivers is not a good idea. And
> > > > > > that is what we are trying to avoid.
> > > > > > 
> > > > > 
> > > > > NO. The sole intention of this patch is to fix the delay observed with _your_
> > > > > out-of-tree controller driver as you explicitly said before. Impact for the
> > > > > existing 2 drivers are just a side effect.
> > > > >
> > > > Hi Mani,
> > > > What is the expectation from the pcie-designware-host driver? If the
> > > > .start_link() has to be defined by the vendor driver, then shouldn't the
> > > > probe be failed if the vendor has not defined it? Thereby failing the
> > > > probe for intel-gw and pcie-dw-plat drivers?
> > > > 
> > > 
> > > intel-gw maintainer agreed to fix the driver [1], but I cannot really comment on
> > > the other one. It is not starting the link at all, so don't know how it works.
> > > I've CCed the driver author to get some idea.
> > > 
> > > [1] https://lore.kernel.org/linux-pci/BY3PR19MB50764E90F107B3256189804CBD432@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > > 
> > > > Additionally, if the link fails to come up even after 1 sec of wait
> > > > time, shouldn't the probe be failed in that case too?
> > > > 
> > > 
> > > Why? The device can be attached at any point of time. What I'm stressing is, the
> > > driver should check for the link to be up during probe and if there is no
> > > device, then it should just continue and hope for the device to show up later.
> > My change is still checking whether the link is up during probe.
> > If yes, print the link status (negotiated speed and width).
> > If no, and the .start_link() exists, then call the same and wait for 1
> > second for the link to be up.
> > 
> 
> There is a reason why we are wating for 1s for the device to show up during
> probe. Please look at my earlier reply to Bjorn.
>
Yes, I looked at that. I am not sure if that is the real reason behind
this delay. The explanation that you quoted from the spec talks about
allowing 1s delay for the EP to return a completion. Whereas the delay
here is to wait for the link training itself to be completed.

We could surely wait for Lorenzo to explain the reason behind this
delay, but he himself approved the earlier patch [1] (which caused the
regression and had to be reverted):
 
 [1] https://lore.kernel.org/all/168509076553.135117.7288121992217982937.b4-ty@xxxxxxxxxx/

> > > This way, the driver can detect the powered up devices during boot and also
> > > detect the hotplug devices.
> > >
> > If the .start_link() is not defined, how will the link come up? Are you
> > assuming that the bootloader might have enabled link training?
> > 
> 
> Yes, something like that. But that assumption is moot in the first place.
> 
Isnt it weird that a PCIe driver, which will most likely initialize all
the resources like power, resets, clocks etc., relies on the bootloader
to have enabled the link training?

I think it is safe to assume that a driver should have the start_link()
defined if it wishes to bring the link up during probe.

> > > > My understanding of these drivers is that the .start_link() is an
> > > > OPTIONAL callback and that the dw_pcie_host_init should help setup the
> > > > SW structures regardless of whether the .start_link() has been defined
> > > > or not, and whether the link is up or not. The vendor should be allowed
> > > > to train the link at a later point of time as well.
> > > > 
> > > 
> > > What do you mean by later point of time? Bringing the link through debugfs? NO.
> > > We cannot let each driver behave differently, unless there is a really valid
> > > reason.
> > > 
> > pci_rescan_bus() is an exported API. I am assuming that this means that
> > it is supposed to be used by modules when they know that the link is up.
> > If a module wishes to bring the link up using debugfs or some other HW
> > trigger, why is that a bad design? In my opinion, this is providing more
> > options to the HW/product designer and the module author, in addition to
> > the already existing .start_link() callback.
> > 
> 
> If the driver _knows_ that the device is up, then rescanning the bus makes
> sense. But here we are checking for the existence of the device.
> 
Yeah, so the driver could _know_ that the device is up later in the
game, right? And then re-scan the bus? Why wait for 1 sec here?

> > > > Please let me know your thoughts.
> > > > > > > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become
> > > > > > > > > negligible. The reason why I'm against not calling start_link() is due to below
> > > > > > > > > reasons:
> > > > > > > > > 
> > > > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones
> > > > > > > > > that powers up the endpoints during boot, then it creates a dependency with
> > > > > > > > > userspace to always start the link even though the devices were available.
> > > > > > > > > That's why we should never fix the behavior of the controller drivers based on a
> > > > > > > > > single platform.
> > > > > > > > I wonder how the behavior is changing with this patch. Do you have an
> > > > > > > > example of a platform which does not have start_link() defined but would
> > > > > > > > like to still wait for a second for the link to come up?
> > > > > > > > 
> > > > > > > 
> > > > > > > Did you went through my reply completely? I mentioned that the 1s delay would be
> > > > > > > gone if you add the async flag to your driver and you are ignoring that.
> > > > > > > 
> > > > The async probe might not help in all the cases. Consider a situation
> > > > where the PCIe is probed after the boot is already completed. The user
> > > > will face the delay then. Do you agree?
> > > > 
> > > 
> > > You mean loading the driver module post boot? If the link is still not up, yes
> > > the users will see the 1sec delay.
> > >
> > > But again, the point I'm trying to make here is, all the drivers have to follow
> > > one flow. We cannot let each driver do its way of starting the link. There could
> > > be exceptions if we get a valid reason for a driver to not do so, but so far I
> > > haven't come across such reason. (the existing drivers, intel-gw and
> > > designware-plat are not exceptions, but they need fixing).
> > > 
> > > For your driver, you said that the userspace brings up the link, post boot when
> > > the devices are powered on. So starting the link during probe incurs 1s delay,
> > > as there would be no devices. But I suggested you to enable async probe to
> > > nullify the 1s delay during probe and you confirmed that it fixes the issue for
> > > you.
> > > 
> > There are emulation/simulation platforms in which 1 second of runtime
> > can correspond to ~1 hour of real-world time. Even if PCIe is probed
> > asyncronously, it will still block the next set of processes.
> >
> 
> Which simulation/emulation platform? The one during pre-production stage? If
Yes, the pre-production stage emulation platform.

> there is no endpoint connected, why would you enable the controller first place?
The endpoint is connected. But my usecase requires me to not bring it up
by default. Rather, I start the link training later using debugfs. So I
want to test my driver probe without attempting link training and
thereby, I face the 1 sec delay. I want to avoid it.

> And how can the endpoint show up later during simulation? Why can't it be up
> earlier?
> 
As explained above, I use debugfs to bring the link up later. The logic
for that is present in my platform driver. Once the link is up, I call
pci_rescan_bus() to enumerate the EP.

> > > Then you are again debating about not defining the start_link() callback :(
> > > 
> > I am not sure why you think I am debating against defining the
> > .start_link() callback. It is clearly an optional pointer and I am
> > choosing to not use it because of the usecase. And if it is optional and
> > I am not using it, why waste 1 sec of runtime waiting for the link? Do
> > we have an example in upstream of platforms where this 1 sec could prove
> > useful where link training is not being started by the platform driver
> > but EPs have to be detected because they are present and powered-up?
> > 
> 
> Please tell me how the link is started in your case without start_link()
> callback.
> 
As explained above, one method is to use debugfs to start the link
training. The actual usecase is for the userspace to start the link
training based on certain GPIO events from the endpoint.

> - Mani
> 
> -- 
> மணிவண்ணன் சதாசிவம்




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux