On Mon, 11 Oct 2021 09:25:23 -0700 Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote: > > Yes, in my mind there are two options to avoid kernel memory usage > per-event. > > 1. > We have a an array per file struct that is independently ref-counted. > This is required to ensure lifetime requirements and to ensure user code > cannot access other user events that might have been free'd outside of > the lifetime and cause a kernel crash. > > This approach also requires 2 int's to be returned, 1 for the status > page the other a local index for the write into the above array per-file > struct. > > This is likely the most complex method due to it's lifetime and RCU > synchronization requirements. However, it represents the least memory to > both kernel and user space. Does it require RCU synchronization as the updates only happen from user space. But is this for the writing of the event? You want a separate fd for each event to write to, instead of saying you have another interface to write and just pass the given id? > > 2. > We have a anon_inode FD that gets installed into the user process and > returned via the ioctl from user_events tracefs file. The file struct > backing the FD is shared by all user mode processes for that event. Like > having an inject/marker file per-event in the user_events subsystem. > > This approach requires an FD returned and either an int for the status > page or the returend FD could expose the ID via another IOCTL being > issued. > > This is the simplest method since the FD manages the lifetime, when FD > is released so is the shared file struct. Kernel side memory is reduced > to only unique events that are actively being used. There is no RCU or > synchronization beyond the FD lifetime. The user mode processes does > incur an FD per-event within their file description table. So they > events charge against their FD per-process limit (not necessarily a bad > thing). > > This also seems to follow the pre-existing patterns of tracefs > (trace_marker, inject, format, etc all have a shared file available to > user-processes that have been granted access). For our case, we want > that, but we want it on a access boundary to who all have access to the > user_events_* tracefs files. We don't want to open up all of tracefs > widely. > > > > I want to make > > > sure the complexity is worth it. Is the overhead of an FD per event in > > > user space too much? > > > > It depends on the use case, how much events you wants to use with > > the user-events. If there are hundreds of the evets, that will consume > > kernel resources and /proc/*/fd/ will be filled with the event's fds. > > But if there is a few events, I think no problem. > > > In our own use case this will be low due to the way we plan to use the > events. However, I am not sure others will follow that :) I will say, whenever we say this will only have a "few", if it becomes useful, it will end up having many. -- Steve