On Thu, Apr 11, 2019 at 06:22:45PM +0000, Arumugam, Kamenee wrote: > > What exactly is causing the UR? Is it something the driver could > > potentially avoid, e.g., an AtomicOp that HFI doesn't support? I > > have a > vague notion that InfiniBand allows some sort of direct > > user-space access to hardware; is there something there that can > > cause a UR? > > HFI PCIe BAR are mapped to user space to implement kernel bypass for > MPI/PSM jobs. In this case, user-level application is making > spurious read accesses (invalid width access) to this memory mapping > causing the device to report an unsupported request error through > AER. The spurious read accesses may be due to errant application > behavior (e.g. reading beyond the end of an array). This is a device bug then. A RDMA device must accept and respond to all TLPs that the CPU could create for the user accessible BAR pages. A user process must not be able to crash the CPU or make the device malfunction by accessing the exposed BAR page. This includes a broad range of topics, like mis-aligned acceses, SSE instructions, atomics, etc. Is blocking AER even enough here? If the device isn't generating a reasonable reply I have a bad feeling worse will happen. Jason