On 18/10/2018 11:30, Xiao Guangrong wrote: > Beside that... i think we get the chance to remove ptr_ring gracefully, > as the bitmap can indicate the ownership of the request as well. If > the bit is 1 (supposing all bits are 1 on default), only the user can > operate it, the bit will be cleared after the user fills the info > to the request. After that, the thread sees the bit is cleared, then > it gets the ownership and finishes the request, then sets bit in > the bitmap. The ownership is returned to the user again. Yes, even better. :) > One thing may be disadvantage is, it can't differentiate the case if the > request is empty or contains the result which need the user call > threads_wait_done(), that will slow threads_wait_done() a little as it > need check all requests, but it is not a big deal as > 1) at the point needing flush, it's high possible that all most requests > have been used. > 2) the total number of requests is going to be very small. threads_wait_done only needs to check bitmap_equal for the two bitmaps, no? (I'm not sure if, with the code below, it would be bitmap_equal or "all bits are different", i.e. xor is all ones. But it's a trivial change). > > It is illustrated by following code by combining the "flip" bitmaps: > > struct Threads { > ...... > > /* > * the bit in these two bitmaps indicates the index of the requests > * respectively. If it's the same, the request is owned by the user, > * i.e, only the use can use the request. Otherwise, it is owned by > * the thread. > */ > > /* after the user fills the request, the bit is flipped. */ > unsigned long *request_fill_bitmap QEMU_ALIGNED(SMP_CACHE_BYTES); > > /* after handles the request, the thread flips the bit. */ > unsigned long *request_done_bitmap QEMU_ALIGNED(SMP_CACHE_BYTES); > } Note that the pointers need not be aligned, because they are only read. It's the data that should be aligned instead (qemu_memalign to allocate it). > threads_submit_request_prepare() > { > request_done_bitmap = READ_ONCE(threads->request_done_bitmap); > result_bitmap = bitmap_xor(&request_done_bitmap, > threads->request_fill_bitmap); > > index = find_first_zero_bit(current-thread-to-request-index, > &result_bitmap); find_next_zero_bit. > /* make sure we get the data the thread written. */ > smp_rmb(); > > thread_request_done(requests[index]); > ... > } > > threads_submit_request_commit() > { > /* make sure the user have filled the request before we make it > be viable to the threads. */ > smp_wmb(); > > /* after that, the thread can handle the request. */ > bitmap_change_bit(request-to-index, threads->request_fill_bitmap); > ...... > } > > In the thread, it does: > thread_run() > { > index_start = threads->requests + itself->index * > threads->thread_ring_size; > index_end = index_start + threads->thread_ring_size; > > loop: > request_fill_bitmap = READ_ONCE(threads->request_fill_bitmap); > request_done_bitmap = READ_ONCE(threads->request_done_bitmap); No need for READ_ONCE (atomic_read in QEMU), as the pointers are never written. Technically READ_ONCE _would_ be needed in bitmap_xor. Either just ignore the issue or write a find_{equal,different}_bit yourself in util/threads.c, so that it can use atomic_read. > result_bitmap = bitmap_xor(&request_fill_bitmap, &request_done_bitmap); > index = find_first_bit_set(&result_bitmap, .start = index_start, > .end = index_end); > > /* > * paired with smp_wmb() in threads_submit_request_commit to > make sure the > * thread can get data filled by the user. > */ > smp_rmb(); > > request = threads->requests[index]; > thread_request_handler(request); > > /* > * updating the request is viable before flip the bitmap, paired > * with smp_rmb() in threads_submit_request_prepare(). > */ > smp_wmb(); No need for smp_wmb before atomic_xor. > bitmap_change_bit_atomic(&threads->request_done_bitmap, index); > ...... > }