Hi Kent, Thanks for commenting. I understood more of the code in trying to make sense of your point, but there are some things still unclear about it; if you could help a bit more, please. Can you describe how a single thread might not be able to use all the slots because 'up to about half of the reqs_available slots might be on other percpu reqs_available' ? I see that the thread might be scheduled on different CPUs (say, only 2 possible CPUs) and perform get_reqs_available() on both -- but that only gives one req_batch to each CPU, and for req_batch to be half of reqs_available its denominator needs to be 2, which doesn't happen w/ num_possible_cpus() * 4 -- which is 8. So I'm a bit confused here. atomic_set(&ctx->reqs_available, ctx->nr_events - 1); ctx->req_batch = (ctx->nr_events - 1) / (num_possible_cpus() * 4); On 10/05/2016 03:34 AM, Kent Overstreet wrote:
- why "num_possible_cpus() * 4", and why "max(nr_events, <it>)" ?
For the scheme to work - percpu allocation of slots - we have to ensure that there aren't too many unused slots stranded on other CPUs. The stranding is limited to 1/4th of the slots [snip]
By 'unused slots' you mean the slots included in the batch allocated to a particular cpu but not actually used by a thread in that cpu? (e.g., get_reqs_available() called once, unused_slots == req_batch - 1) Can you please detail a bit more how the limit to 1/4th of the slots is ensured because of "num_possible_cpus() * 4", and what is the scenario where the math is based on? I've been thinking and assuming values for a while now, and didn't figure out the point where / how it occurs. Thanks for your support, -- Mauricio Faria de Oliveira IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html