On 2/14/2025 3:24 PM, Dan Williams wrote: > Bowman, Terry wrote: >> >> On 2/11/2025 7:23 PM, Dan Williams wrote: >>> Terry Bowman wrote: >>>> The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS >>>> registers for the endpoint's Root Port. The same needs to be done for >>>> each of the CXL Downstream Switch Ports and CXL Root Ports found between >>>> the endpoint and CXL Host Bridge. >>>> >>>> Introduce cxl_init_ep_ports_aer() to be called for each CXL Port in the >>>> sub-topology between the endpoint and the CXL Host Bridge. This function >>>> will determine if there are CXL Downstream Switch Ports or CXL Root Ports >>>> associated with this Port. The same check will be added in the future for >>>> upstream switch ports. >>>> >>>> Move the RAS register map logic from cxl_dport_map_ras() into >>>> cxl_dport_init_ras_reporting(). This eliminates the need for the helper >>>> function, cxl_dport_map_ras(). >>> Not sure about the motivation here... >>> >>>> cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map >>>> the RAS registers for CXL Downstream Switch Ports and CXL Root Ports. >>> Ok, makes sense... >>> >>>> cxl_dport_init_ras_reporting() must check for previously mapped registers >>>> before mapping. This is required because multiple Endpoints under a CXL >>>> switch may share an upstream CXL Root Port, CXL Downstream Switch Port, >>>> or CXL Downstream Switch Port. Ensure the RAS registers are only mapped >>>> once. >>> Sounds broken. Every device upstream-port only has one downstream port. >>> >>> A CXL switch config looks like this: >>> >>> │ >>> ┌──────────┼────────────┐ >>> │SWITCH ┌┴─┐ │ >>> │ │UP│ │ >>> │ └─┬┘ │ >>> │ ┌──────┼─────┐ │ >>> │ │ │ │ │ >>> │ ┌┴─┐ ┌─┴┐ ┌─┴┐ │ >>> │ │DP│ │DP│ │DP│ │ >>> │ └┬─┘ └─┬┘ └─┬┘ │ >>> └────┼──────┼─────┼─────┘ >>> ┌┴─┐ ┌─┴┐ ┌─┴┐ >>> │EP│ │EP│ │EP│ >>> └──┘ └──┘ └──┘ >>> >>> ...so how can an endpoint ever find that its immediate parent downstream >>> port has already been mapped? >> >> ┌─┴─┐ >> │RP1│ >> └─┬─┘ >> ┌───────────┼───────────┐ >> │SWITCH ┌─┴─┐ │ >> │ │UP1│ │ RP1 - 0c:00.0 >> │ └─┬─┘ │ UP1 - 0d:00.0 >> │ ┌──────┼─────┐ │ DP1 - 0e:00.0 >> │ │ │ │ │ >> │ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ │ >> │ │DP1│ │DP2│ │DP3│ │ >> │ └─┬─┘ └─┬─┘ └─┬─┘ │ >> └────┼──────┼─────┼─────┘ >> ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ >> │EP1│ │EP2│ │EP3│ >> └───┘ └───┘ └───┘ >> >> >> It cant but the root RP and USP have duplicate calls for each EP in the example diagram. >> The function's purpose is to map RAS registers and cache the address. This reuses the >> same function for RP and DSP. The DSP will never be previously mapped as you indicated. > Are you talking about in the current code, which should have already > reported problems due to multiple overlapping mappings, or with the > proposed changes? Can you clarify the sequenece of calls that triggers > the multiple mappings of RP1? Yes, in this thread I was discussing the current implementation. The multiple calls to map RPs and USPs occur with the below calls. It iterates from endpoint to RP. From patches 7 and 8 (v7): devm_cxl_add_endpoint() cxl_init_ep_ports_aer(ep) - Calls for each port between EP and RP.cxl_dport_init_ras_reporting() - Maps DP/RP RAS > Also, if EP1 and EP2 race to establish the RP1 mapping, then wouldn't > EP1 and EP2 also race to tear it down? What prevents EP2 from unmapping > RP1 if EP1 still needs it mapped? > > I would prefer that rather than EP1 being responsible for mapping RP1 > RAS, and a lock to prevent EP2 and EP3 from also repeating that, it > should be UP1 in cxl_switch_port_probe() taking responsibility for > mapping RP1 RAS. > > One of the known problems with cxl_switch_port_probe() is that it > enumerates all dports regardless of attachment. That would be where I > would expect problems of dports already going through initialization > prematurely in advance of an endpoint showing up. However, that's a > different fix. Yes, there is a problem with the unmapping. Your recommendation is a good idea. Shouldn't cxl_switch_port_probe() map UP1 RAS as well? > [..] >>>> + >>>> + /* dport may have more than 1 downstream EP. Check if already mapped. */ >>>> + mutex_lock(&ras_init_mutex); >>> I suspect this lock and check got added to workaround "Failed to request >>> region" messages coming out of devm_cxl_iomap_block() in testing? Per >>> above, that's not "more than 1 downstream EPi", that's "failure to clean >>> up the last mapping for the next cxl_mem_probe() event of the same >>> endpoint". >> Synchronization was added to handle the concurrent accesses. I never observed >> issues due to the race condition for RP and USP but I confirmed through further >> testing it is a real potential issue for the RP and USP. > It is still not clear to me how this singleton lock helps when multiple > EPs are sharing a resource that both races init and shutdown. It doesn't. I overlooked the chaotic unmap situation unfortunately. >> You recommended, in the next patch, to map USP RAS registers from cxl_endpoint_port_probe(). >> Would you like the RP and DSP mapping to be called from cxl_endpoint_port_probe() as well? Terry > I think it is broken for an EP to be reaching through a switch to > initialize a shared resource at the RP level. Each level of the > hierarchy should take care of its immediate parent. We need this > bottom-up incremental arrangement due to the way that CXL hides > register blocks until CXL link up. Ok, understood. I have already moved over the port iteration that was in cxl_init_ep_ports_aer() to cxl_endpoint_port_probe(). I now need to change the logic that iterates EP to RP to be more localized (just to the endpoint's immediate DSP/RP). And from your comments above I understand I need to update the cxl_switch_port_probe() to map upstream RP (DSP for multi-level SW).Terry