Bowman, Terry wrote: > > > On 2/14/2025 3:24 PM, Dan Williams wrote: > > Bowman, Terry wrote: > >> > >> On 2/11/2025 7:23 PM, Dan Williams wrote: > >>> Terry Bowman wrote: > >>>> The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS > >>>> registers for the endpoint's Root Port. The same needs to be done for > >>>> each of the CXL Downstream Switch Ports and CXL Root Ports found between > >>>> the endpoint and CXL Host Bridge. > >>>> > >>>> Introduce cxl_init_ep_ports_aer() to be called for each CXL Port in the > >>>> sub-topology between the endpoint and the CXL Host Bridge. This function > >>>> will determine if there are CXL Downstream Switch Ports or CXL Root Ports > >>>> associated with this Port. The same check will be added in the future for > >>>> upstream switch ports. > >>>> > >>>> Move the RAS register map logic from cxl_dport_map_ras() into > >>>> cxl_dport_init_ras_reporting(). This eliminates the need for the helper > >>>> function, cxl_dport_map_ras(). > >>> Not sure about the motivation here... > >>> > >>>> cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map > >>>> the RAS registers for CXL Downstream Switch Ports and CXL Root Ports. > >>> Ok, makes sense... > >>> > >>>> cxl_dport_init_ras_reporting() must check for previously mapped registers > >>>> before mapping. This is required because multiple Endpoints under a CXL > >>>> switch may share an upstream CXL Root Port, CXL Downstream Switch Port, > >>>> or CXL Downstream Switch Port. Ensure the RAS registers are only mapped > >>>> once. > >>> Sounds broken. Every device upstream-port only has one downstream port. > >>> > >>> A CXL switch config looks like this: > >>> > >>> │ > >>> ┌──────────┼────────────┐ > >>> │SWITCH ┌┴─┐ │ > >>> │ │UP│ │ > >>> │ └─┬┘ │ > >>> │ ┌──────┼─────┐ │ > >>> │ │ │ │ │ > >>> │ ┌┴─┐ ┌─┴┐ ┌─┴┐ │ > >>> │ │DP│ │DP│ │DP│ │ > >>> │ └┬─┘ └─┬┘ └─┬┘ │ > >>> └────┼──────┼─────┼─────┘ > >>> ┌┴─┐ ┌─┴┐ ┌─┴┐ > >>> │EP│ │EP│ │EP│ > >>> └──┘ └──┘ └──┘ > >>> > >>> ...so how can an endpoint ever find that its immediate parent downstream > >>> port has already been mapped? > >> > >> ┌─┴─┐ > >> │RP1│ > >> └─┬─┘ > >> ┌───────────┼───────────┐ > >> │SWITCH ┌─┴─┐ │ > >> │ │UP1│ │ RP1 - 0c:00.0 > >> │ └─┬─┘ │ UP1 - 0d:00.0 > >> │ ┌──────┼─────┐ │ DP1 - 0e:00.0 > >> │ │ │ │ │ > >> │ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ │ > >> │ │DP1│ │DP2│ │DP3│ │ > >> │ └─┬─┘ └─┬─┘ └─┬─┘ │ > >> └────┼──────┼─────┼─────┘ > >> ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ > >> │EP1│ │EP2│ │EP3│ > >> └───┘ └───┘ └───┘ > >> > >> > >> It cant but the root RP and USP have duplicate calls for each EP in the example diagram. > >> The function's purpose is to map RAS registers and cache the address. This reuses the > >> same function for RP and DSP. The DSP will never be previously mapped as you indicated. > > Are you talking about in the current code, which should have already > > reported problems due to multiple overlapping mappings, or with the > > proposed changes? Can you clarify the sequenece of calls that triggers > > the multiple mappings of RP1? > Yes, in this thread I was discussing the current implementation. The > multiple calls to map RPs and USPs occur with the below calls. It iterates from > endpoint to RP. From patches 7 and 8 (v7): > > devm_cxl_add_endpoint() cxl_init_ep_ports_aer(ep) - Calls for each port between EP and RP.cxl_dport_init_ras_reporting() - Maps DP/RP RAS Ah, thanks, I missed that. I misread the patch and thought that cxl_init_ep_ports_aer() was only being called for the immediate dport parent. > > > Also, if EP1 and EP2 race to establish the RP1 mapping, then wouldn't > > EP1 and EP2 also race to tear it down? What prevents EP2 from unmapping > > RP1 if EP1 still needs it mapped? > > > > I would prefer that rather than EP1 being responsible for mapping RP1 > > RAS, and a lock to prevent EP2 and EP3 from also repeating that, it > > should be UP1 in cxl_switch_port_probe() taking responsibility for > > mapping RP1 RAS. > > > > One of the known problems with cxl_switch_port_probe() is that it > > enumerates all dports regardless of attachment. That would be where I > > would expect problems of dports already going through initialization > > prematurely in advance of an endpoint showing up. However, that's a > > different fix. > Yes, there is a problem with the unmapping. Your recommendation is a good > idea. > > Shouldn't cxl_switch_port_probe() map UP1 RAS as well? Yes, that would naturally fit there I think, especially because it naturally handles the case of the port and mapping staying alive until the last endpooint in that topology is removed. > Ok, understood. I have already moved over the port iteration that was > in cxl_init_ep_ports_aer() to cxl_endpoint_port_probe(). I now need to > change the logic that iterates EP to RP to be more localized (just to > the endpoint's immediate DSP/RP). And from your comments above I > understand I need to update the cxl_switch_port_probe() to map > upstream RP (DSP for multi-level SW).Terry Sounds good, thanks Terry!