Re: [PATCH v3 2/5] util: introduce threaded workqueue

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Tue, 27 Nov 2018 14:51:15 +0100

On 27/11/18 13:49, Christophe de Dinechin wrote:
> So this is not really
> helping. Also, the ThreadLocal structure itself is not necessarily aligned
> within struct Threads. Therefore, it’s possible that “requests” for example
> could be on the same cache line as request_fill_bitmap if planets align
> the wrong way.

I think this is a bit exaggerated.  Linux and QEMU's own qht work just
fine with compile-time directives.

> In order to mitigate these effects, I would group the data that the user
> writes and the data that the thread writes, i.e. reorder declarations,
> put request_fill_bitmap and request_valid_ev together, and try
> to put them in the same cache line so that only one cache line is invalidated
> from within mark_request_valid instead of two.
> 
> Then you end up with a single alignment directive instead of 4, to
> separate requests from completions.

Yeah, I agree with this.

> That being said, I’m not sure why you use a bitmap here. What is the
> expected benefit relative to atomic lists (which would also make it really
> lock-free)?
> 

I don't think lock-free lists are easier.  Bitmaps smaller than 64
elements are both faster and easier to manage.

Paolo