Hey Bob Sorry to disturb you. Are you going to revert it to single map as the suggestion from Jason Thanks Zhijian On 27/05/2022 20:42, Jason Gunthorpe wrote: > On Tue, May 24, 2022 at 05:28:00PM -0500, Bob Pearson wrote: > >> We have a work around by fencing all the local operations which more >> or less works but will have bad performance. The maps used in FMRs >> have fairly short lifetimes but definitely longer than we we can >> support today. I am trying to work out the semantics of everything. > IBTA specifies the fence requirements, I thought we decided RXE or > maybe even lustre wasn't following the spec? > >> To make this all recoverable in the face of errors let there be more >> than one map present for an FMR indexed by the key portion of the >> l/rkeys. > Real HW doesn't have more than one map, this seems like the wrong > direction. > > As we discussed, there is something wrong with how rxe is processing > its queues, it isn't following IBTA define behaviors in the > exceptional cases. > >> Alternative view of FMRs: >> >> verb: ib_alloc_mr(pd, max_num_sg) - create an empty MR object with no maps >> with l/rkey = [index, key] with index >> fixed and key some initial value. >> >> verb: ib_update_fast_reg_key(mr, newkey) - update key portion of l/rkey >> >> verb: ib_map_mr_sg(mr, sg, sg_nents, sg_offset) - create a new map from allocated memory >> or by re-using an INVALID map. Maps are >> all the same size (max_num_sg). The >> key (index) of this map is the current >> key from l/rkey. The initial state of >> the map is FREE. (and thus not usable >> until a REG_MR work request is used.) > More than one map is nonsense, real HW has a single map, a MR object is that > single map. > >> This is an improvement over the current state. At the moment we have >> only two maps one for making new ones and one for doing IO. There is >> no room to back up but at the moment the retry logic assumes that >> you can which is false. This can be fixed easily by forcing all >> local operations to be fenced which is what we are doing at the >> moment at HPE. This can insert long delays between every new FMR >> instance. By allowing three maps and then fencing we can back up >> one broken IO operation without too much of a delay. > IMHO you need to go back to one map and fix the queue processing > logic to be spec compliant. > > Jason