Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peter,

On Mon, Apr 22, 2024 at 11:23:50AM -0700, Peter Newman wrote:
> Hi Dave,
> 
> On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
> >
> > Hi Babu,
> >
> > On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote:
> > >        Assignment flags can be one of the following:
> > >
> > >         t  MBM total event is assigned
> >
> > With my MPAM hat on this looks a bit weird, although I suppose it
> > follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are
> > already exposed in resctrlfs.
> >
> > From an abstract point of view, "total" and "local" are just event
> > selection criteria, additional to those in mbm_cfg_mask.  The different
> > way they are treated in the hardware feels like an x86 implementation
> > detail.
> >
> > For MPAM we don't currently distinguish local from non-local traffic, so
> > I guess this just reduces to a simple on-off (i.e., "t" or nothing),
> > which I guess is tolerable.
> >
> > This might want more thought if there is an expectation that more
> > categories will be added here, though (?)
> 
> There should be a path forward whenever we start supporting
> user-configured counter classes. I assume the letters a-z will be
> enough to cover all the counter classes which could be used at once.

Ack, though I'd appreciate a response on the point about "_" below in
case people missed it.

> 
> >
> > >         l  MBM local event is assigned
> > >         tl Both total and local MBM events are assigned
> > >         _  None of the MBM events are assigned
> >
> > This use of '_' seems unusual.  Can we not just have the empty string
> > for "nothing assigned"?
> >
> > Since every assignment is terminated by ';' or end-of-line, I don't
> > think that there would be any parsing ambiguity (?)
> >
> > >
> > >       Examples:
> > >
> > >       # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > >       non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >       non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >       //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >       /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >
> > >       There are four groups and all the groups have local and total event assigned.
> > >
> > >       "//" - This is a default CONTROL MON group
> > >
> > >       "non_defult_group//" - This is non default CONTROL MON group
> > >
> > >       "/default_mon1/"  - This is Child MON group of the defult group
> > >
> > >       "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> > >
> > >       =tl means both total and local events are assigned.
> > >
> > > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
> > >
> > >       The write format is similar to the above list format with addition of
> > >       op-code for the assignment operation.
> >
> > With by resctrl newbie hat on:
> >
> > It feels a bit complex (for the kernel) to have userspace needing to
> > write a script into a magic file that we need to parse, specifying
> > updates to a bunch of controls already visible as objects in resctrlfs
> > in their own right.
> >
> > What's the expected use case here?
> 
> I went over the use case of iterating a small number of monitors over
> a much larger number of monitoring groups here:
> 
> https://lore.kernel.org/lkml/CALPaoCi=PCWr6U5zYtFPmyaFHU_iqZtZL-LaHC2mYxbETXk3ig@xxxxxxxxxxxxxx/
> 
> >
> > If userspace really does need to switch lots of events simultaneously
> > then I guess the overhead of enumerating and poking lots of individual
> > files might be unacceptable though, and we would still need some global
> > interfaces for operations such as "unassign everything"...
> 
> My main goal is for the number of parallel IPI batches to all the
> domains (or write syscalls) to be O(num_rmids / num_monitors) rather
> than O(num_rmids * num_monitors) as I need to know how frequently we
> can afford to sample the current memory bandwidth of the maximum
> number of monitoring groups supported.

Fair enough; I wasn't fully aware of the background discussions.

Cheers
---Dave




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux