On Fri, Mar 29, 2024 at 09:55:02AM -0500, Bob Pearson wrote: > This series of patches is the result of high scale testing on a large > HPC system with a large attached Lustre file system. Several errors > were found which had not been previously seen at smaller scales. In > this case up to 1600 QPs on 1024 compute nodes attached to about 100 > flash storage nodes. Each patch has it's own description. > > v3 > Fixed an error in "Don't call rxe_requester from rxe_completer" > Moved run_requester_again from a global to rxe_req_info.again. > The control parameter has to be local to each qp. > v2 > Minor edits to some of the commit messages. > Added a missing change to "Don't schedule rxe_completer...". > Added a missing change to "Git rid of pkt resend on err". > Added one additional commit. > > Bob Pearson (12): > RDMA/rxe: Fix seg fault in rxe_comp_queue_pkt > RDMA/rxe: Allow good work requests to be executed > RDMA/rxe: Remove redundant scheduling of rxe_completer > RDMA/rxe: Merge request and complete tasks > RDMA/rxe: Remove save/rollback_state in rxe_requester > RDMA/rxe: Don't schedule rxe_completer from rxe_requester > RDMA/rxe: Don't call rxe_requester from rxe_completer > RDMA/rxe: Don't call direct between tasks > RDMA/rxe: Fix incorrect rxe_put in error path > RDMA/rxe: Make rxe_loopback match rxe_send behavior > RDMA/rxe: Get rid of pkt resend on err > RDMA/rxe: Let destroy qp succeed with stuck packet Applied to for-next, thanks Jason