> Yes, we can do something like that. However I think put_qnode() needs to > use atomic dec as well. As a result, we will need 2 additional atomic > operations per slowpath invocation. The code may look simpler, but I > don't think it will be faster than what I am currently doing as the > cases where the used flag is set will be relatively rare. The increment does *not* have to be atomic. First of all, note that the only reader that matters is a local interrupt; other processors never access the variable at all, so what they see is irrelevant. "Okay, so I use a non-atomic RMW instruction; what about non-x86 processors without op-to-memory?" Well, they're okay, too. The only requriement is that the write to qna->cnt must be visible to the local processor (barrier()) before the qna->nodes[] slot is used. Remember, a local interrupt may use a slot temporarily, but will always return qna->cnt to its original value before returning. So there's nothing wrong with - Load qna->cnt to register - Increment register - Store register to qna->cnt Because an interrupt, although it may temporarily modify qna->cnt, will restore it before returning so this code will never see any modification. Just like using the stack below the %rsp, the only requirement is to ensure that the qna->cnt increment is visble *to the local processor's interrupt handler* before actually using the slot. The effect of the interrupt handler is that it may corrupt, at any time and without warning, any slot not marked in use via qna->cnt. But that's not a difficult thing to deal with, and does *not* require atomic operations. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html