On 10/14/24 23:24, Dan Williams wrote:
Alejandro Lucero Palau wrote:
On 10/12/24 22:57, Dan Williams wrote:
Alejandro Lucero Palau wrote:
[..]
I am skeptical that PROBE_FORCE_SYNCRONOUS is a fix for any
device-readiness bug. Some other assumption is violated if that is
required.
But that problem is not about device readiness but just how the device
model works. In this case the memdev creation is adding devices, no real
ones but those abstractions we use from the device model, and that
device creation is done asynchronously.
Device creation is not done asynchronously, the PCI driver is attaching
asynchrounously. When the PCI driver attaches it creates memdevs and
those are attached to cxl_mem synchronously.
memdev, a Type2 driver in my case, is going to work with such a device
abstraction just after the memdev creation, it is not there yet.
Oh, is the concern that you always want to have the memdev attached to
cxl_mem immediately after it is registered?
I think that is another case where "MODULE_SOFTDEP("pre: cxl_mem")" is
needed. However, to fix this situation once and for all I think I would
rather just drop all this modularity and move both cxl_port and cxl_mem
to be drivers internal to cxl_core.ko similar to the cxl_region driver.
Oh, so the problem is the code is not ready because the functionality is
in a module not loaded yet.
Right.
Then it makes sense that change. I'll do it if not already taken. I'll
send v4 without the PROBE_FORCE_SYNCHRONOUS flag and without the
previous loop with delays implemented in v3.
So I think EPROBE_DEFER can stay out of the CXL core because it is up to
the accelerator driver to decide whether CXL availability is fatal to
init or not.
It needs support from the cxl core though. If the cxl root is not there
yet, the driver needs to know, and that is what you did in your original
patch and I'm keeping as well.
Additionally, I am less and less convinced that Type-2 drivers should be
forced to depend on the cxl_mem driver to attach vs just arranging for
those Type-2 drivers to call devm_cxl_enumerate_ports() and
devm_cxl_add_endpoint() directly. In other words I am starting to worry
that the generic cxl_mem driver design pattern is a midlayer mistake.
You know better than me but in my view, a Type2 should follow what a
Type3 does with some small changes for dealing with the differences,
mainly the way it is going to be used and those optional capabilities
for Type2. This makes more sense to me for Type1.
So, if it makes it easier for sfc, I would be open to exploring a direct
scheme for endpoint attachment, and not requiring the generic cxl_mem
driver as an intermediary.
V4 is ready (just having problems when testing with 6.12-rcX) so I would
like to keep it and explore this once we have something working and
accepted. Type2 and Type1 with CXL cache will bring new challenges and I
bet we will need refactoring in the code and potentially new design for
generic code (Type3 and Type2, Type2 and Type1).