On Sat, Sep 23, 2017 at 3:44 AM, Thangirala, Hari <Hari.Thangirala at amd.com> wrote: > Hi Felix, > > Event IDs reserved for debug events was an artifact of a debugger design. We changed our tools strategy so those limitations do not exist anymore. There's no need to reserve special event-IDs for debugger. 4096 is good enough and the new debug trap handler will use events from that pool. > > Thanks > Hari > > > > -----Original Message----- > From: Kuehling, Felix > Sent: Thursday, September 21, 2017 9:36 PM > To: amd-gfx at lists.freedesktop.org; Oded Gabbay <oded.gabbay at gmail.com>; Thangirala, Hari <Hari.Thangirala at amd.com> > Subject: KFD event handling questions > > [Hari, can you comment on the last paragraph below regarding the events limit? Would it hurt the runtime much to lose 8 signals out of 4096?] > > Hi Oded, > > I'm looking into upstreaming some event handling enhancements. I see that you actually worked on increasing the number of events to 4096 and working around some problems with debug events when you were at AMD. > > As I'm looking at this, I'm confused by the code that allocates signal pages. It allocates a single signal page that's big enough to accommodate all signals. But allocate_signal_page looks like it was meant to allocate smaller signal pages incrementally that are accumulated in a kfd_process::signal_event_pages list. However, this feature never gets used. It would also break user mode because the Thunk expects to be able to map a single signal page for all its signals. > > Therefore I'm inclined to simplify allocate_signal_page and related code to only deal with a single allocation. Do you have any concerns or objections about that? Feel free to simplify that code. At the time I believe we didn't know how many events will be needed so we prepared a more generic mechanism for allocation. I guess we are now smarter (after almost 4 years) so simplification seems like the right thing to do if it matches the rest of the stack. > > Somewhat related to that, you also added a workaround that increases the signal number to 4096+512 to accommodate 8 debug events while maintaining page alignment. However, the signal page is allocated with __get_free_pages, which allocates in powers of 2. So in fact it needs to allocate 8192 signals, wasting about 28KB per process. It's not a lot of waste. But an alternative would be to instead sacrifice 8 user signals. > The runtime has a fallback for when it runs out of signals, and the difference between 4096 and 4088 seems insignificant. What do you think? I believe Hari already answered that and I don't have anything more to add. Thanks, Oded > > Regards, > Felix > > -- > F e l i x K u e h l i n g > PMTS Software Development Engineer | Vertical Workstation/Compute > 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada > (O) +1(289)695-1597 > _ _ _ _____ _____ > / \ | \ / | | _ \ \ _ | > / A \ | \M/ | | |D) ) /|_| | > /_/ \_\ |_| |_| |_____/ |__/ \| facebook.com/AMD | amd.com >