On 27/11/18 13:49, Christophe de Dinechin wrote: > So this is not really > helping. Also, the ThreadLocal structure itself is not necessarily aligned > within struct Threads. Therefore, it’s possible that “requests” for example > could be on the same cache line as request_fill_bitmap if planets align > the wrong way. I think this is a bit exaggerated. Linux and QEMU's own qht work just fine with compile-time directives. > In order to mitigate these effects, I would group the data that the user > writes and the data that the thread writes, i.e. reorder declarations, > put request_fill_bitmap and request_valid_ev together, and try > to put them in the same cache line so that only one cache line is invalidated > from within mark_request_valid instead of two. > > Then you end up with a single alignment directive instead of 4, to > separate requests from completions. Yeah, I agree with this. > That being said, I’m not sure why you use a bitmap here. What is the > expected benefit relative to atomic lists (which would also make it really > lock-free)? > I don't think lock-free lists are easier. Bitmaps smaller than 64 elements are both faster and easier to manage. Paolo