On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote: > On 32-bit architectures, there aren't enough bits to hold both. > 64-bit architectures, however, can have enough bits to do that. For > x86-64, the physical address can use up to 52 bits. That is 4PB of > memory. That leaves 12 bits available for other use. The task structure > pointer is also aligned to the L1 cache size. That means another 6 bits > (64 bytes cacheline) will be available. Reserving 2 bits for status > flags, we will have 16 bits for the reader count. That can supports > up to (64k-1) readers. *groan*... So take qrwlock's idea for a queue, then make the count value (similar to the new mutex); that is have a bit0 be a r/w bit, when w bits 6-N are owner, when r they are reader-count. bit1 can be a pending bit, bit2 a handoff bit etc.. That should fit and work on 32bit and 64bit without issue. I have a half-arsed rwsem-atomic.c somewhere that does just that. I just never got around to doing all the optimistic spin and steal crap that makes our current rwsem fly. And that nicely gets rid of that mind bending BIAS crud.