----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx wrote: > User processes may require many events and when they do the cache > performance of a byte index status check is less ideal than a bit index. > The previous event limit per-page was 4096, the new limit is 32,768. > > This change adds a mask property to the user_reg struct. Programs check > that the byte at status_index has a bit set by ANDing the status_mask. > > Link: > https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@xxxxxxxxxxxx/ > > Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > Signed-off-by: Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> Hi Beau, Considering this will be used in a fast-path, why choose bytewise loads for the byte at status_index and the status_mask ? I'm concerned about the performance penalty associated with partial register stalls when working with bytewise ALU operations rather than operations using the entire registers. Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit binaries and 64-bit on 64-bit binaries) for both the array access and the status mask, but this brings extra complexity for 32-bit compat handling. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com