On 29/06/17 16:32, Andrew Haley wrote: > On 29/06/17 15:23, Toebs Douglass wrote: >> Actually, I have a bone to pick with them - the instruction to find out >> the ERG size is privileged. As such, unless you know better, you have >> to assume the worst case of 2048 bytes. This makes lock-free data >> structure state and element structs huge. > > What for? Just assume that it's a cache line, and if that just happens > to run slowly, so what? The user has bought the wrong hardware. And > people who make hardware like that should be punished in the profit > margin... So, for example, the data structure state structures usually contain pointers central to the data structure - for example, in the queue, there is a pointer to the start and a pointer to the end of the queue. These two pointers are in fact usually independent - enqueuing and dequeues occur independently. If the two pointers were in the same ERG, they would no longer be wholly independent and performance would unnecessarily suffer. I was actually wrong to say the elements would get big - the queue elements actually already are using cache line padding only. So what I see in fact is that the queue state structure has three members with ERG padding - enqueue, dequeue and the ABA counter which is used when elements are inserted into the queue. >> It would be enough of a pain to have to make alignment run-time, but you >> could do it if you could get ERG size - *but you can't even get ERG size*. >> >> Ahhh, face palm, etc. > > I think you can blame that one on kernel architects: it should be possible > for them to read the information and tell you. Do they not? I aim to be platform independent. The library compiles on a *bare* C89 compiler - not even freestanding. I think I might be able to find ERG somewhere in /proc on the Linux port, but then I'd need to to link to something to do I/O from /proc, and then what about on Windows? there's prolly a function call for it - but what about a bare metal embedded platform? On Intel I think I can call cpuid, no problems. (I didn't realise until the last week I might need to - I assumed cache line lengths on x86 were 32 bytes and on x86_64, 64 bytes. This is not the case - apparently x86 cache line lengths have varied between 32 and 128 bytes; and I'm not sure I can tell which target I'm on just from the GCC predefines, so maybe I will need cpuid and then to do run-time alignment).