----- On Apr 19, 2022, at 7:48 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx wrote: > On Tue, Apr 19, 2022 at 05:26:20PM -0400, Mathieu Desnoyers wrote: >> ----- On Apr 19, 2022, at 2:57 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx >> wrote: >> >> > On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote: >> >> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx wrote: >> >> >> >> > User processes may require many events and when they do the cache >> >> > performance of a byte index status check is less ideal than a bit index. >> >> > The previous event limit per-page was 4096, the new limit is 32,768. >> >> > >> >> > This change adds a mask property to the user_reg struct. Programs check >> >> > that the byte at status_index has a bit set by ANDing the status_mask. >> >> > >> >> > Link: >> >> > https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@xxxxxxxxxxxx/ >> >> > >> >> > Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> >> >> > Signed-off-by: Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> >> >> >> >> Hi Beau, >> >> >> >> Considering this will be used in a fast-path, why choose bytewise >> >> loads for the byte at status_index and the status_mask ? >> >> >> > >> > First, thanks for the review! >> > >> > Which loads are you concerned about? The user programs can store the >> > index and mask in another type after registration instead of an int. >> >> I'm concerned about the loads from user-space, considering that >> those are on the fast-path. >> >> Indeed user programs will need to copy the status index and mask >> returned in struct user_reg, so adapting the indexing and mask to >> deal with an array of unsigned long rather than bytes can be done >> at that point, but I wonder how many users will go through that >> extra trouble unless there are helpers to convert the status index >> from byte-wise to long-wise, and convert the status mask from a >> byte-wise mask to a long-wise mask (and associated documentation). >> > > Yeah, do you think it's wise to maybe add inline functions in > user_events.h to do this conversion? I could then add them to our > documentation. > > Hopefully this would make more APIs/people do the better approach? > >> >> > >> > However, you may be referring to something on the kernel side? >> >> No. >> > > [..] > >> >> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit >> >> binaries and 64-bit on 64-bit binaries) for both the array access >> >> and the status mask, but this brings extra complexity for 32-bit compat >> >> handling. >> >> >> > >> > User programs can store the index and mask returned into better value >> > types for their architecture. >> > >> > I agree it will cause compat handling issues if it's put into the user >> > facing header as a long. >> > >> > I was hoping APIs, like libtracefs, could abstract many callers from how >> > best to use the returned values. For example, it could save the index >> > and mask as unsigned long for the callers and use those for the >> > enablement checks. >> > >> > Do you think there is a way to enable these native types in the ABI >> > without causing compat handling issues? I used ints to prevent compat >> > issues between 32-bit user mode and 64-bit kernel mode. >> >> I think you are right: this is not an ABI issue, but rather a usability >> issue that can be solved by implementing and documenting user-space library >> helpers to help user applications index the array and apply the mask to an >> unsigned long type. >> > > Great. Let me know if updating user_events.h to do the conversion is a > good idea or not, or if you have other thoughts how to make more people > do the best thing. Usually uapi headers are reserved for exposing the kernel ABI to user-space. I think the helpers we are discussing here do not belong to the uapi because they do not define the ABI, and should probably sit elsewhere in a proper library. If the status_mask is meant to be modified in some ways by user-space before it can be used as a mask, I wonder why it is exposed as a byte-wise mask at all ? Rather than exposing a byte-wise index and single-byte mask as ABI, the kernel could simply expose a bit-wise index, which can then be used by the application to calculate index and mask, which it should interpret in little endian if it wants to apply the mask on types larger than a single byte. Thoughts ? Thanks, Mathieu > >> Thanks, >> >> Mathieu >> >> > >> >> Thanks, >> >> >> >> Mathieu >> >> >> >> -- >> >> Mathieu Desnoyers >> >> EfficiOS Inc. >> >> http://www.efficios.com >> > >> > Thanks, >> > -Beau >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > Thanks, > -Beau -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com