----- On Apr 19, 2022, at 2:57 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx wrote: > On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote: >> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx wrote: >> >> > User processes may require many events and when they do the cache >> > performance of a byte index status check is less ideal than a bit index. >> > The previous event limit per-page was 4096, the new limit is 32,768. >> > >> > This change adds a mask property to the user_reg struct. Programs check >> > that the byte at status_index has a bit set by ANDing the status_mask. >> > >> > Link: >> > https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@xxxxxxxxxxxx/ >> > >> > Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> >> > Signed-off-by: Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> >> >> Hi Beau, >> >> Considering this will be used in a fast-path, why choose bytewise >> loads for the byte at status_index and the status_mask ? >> > > First, thanks for the review! > > Which loads are you concerned about? The user programs can store the > index and mask in another type after registration instead of an int. I'm concerned about the loads from user-space, considering that those are on the fast-path. Indeed user programs will need to copy the status index and mask returned in struct user_reg, so adapting the indexing and mask to deal with an array of unsigned long rather than bytes can be done at that point, but I wonder how many users will go through that extra trouble unless there are helpers to convert the status index from byte-wise to long-wise, and convert the status mask from a byte-wise mask to a long-wise mask (and associated documentation). > > However, you may be referring to something on the kernel side? No. > >> I'm concerned about the performance penalty associated with partial >> register stalls when working with bytewise ALU operations rather than >> operations using the entire registers. >> > > On the kernel side these only occur when a registration happens (pretty > rare compared to enabled checks) or a delete (even rarer). But I have > the feeling you are more concerned about the user side, right? Right. > >> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit >> binaries and 64-bit on 64-bit binaries) for both the array access >> and the status mask, but this brings extra complexity for 32-bit compat >> handling. >> > > User programs can store the index and mask returned into better value > types for their architecture. > > I agree it will cause compat handling issues if it's put into the user > facing header as a long. > > I was hoping APIs, like libtracefs, could abstract many callers from how > best to use the returned values. For example, it could save the index > and mask as unsigned long for the callers and use those for the > enablement checks. > > Do you think there is a way to enable these native types in the ABI > without causing compat handling issues? I used ints to prevent compat > issues between 32-bit user mode and 64-bit kernel mode. I think you are right: this is not an ABI issue, but rather a usability issue that can be solved by implementing and documenting user-space library helpers to help user applications index the array and apply the mask to an unsigned long type. Thanks, Mathieu > >> Thanks, >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > Thanks, > -Beau -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com