On Tue, May 29, 2018 at 06:58:01PM +0800, bing.niu@xxxxxxxxx wrote: > From: Bing Niu <bing.niu@xxxxxxxxx> > > This series is to introduce RDT memory bandwidth allocation support by extending > current virresctrl implementation. > > The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate > control over memory bandwidth available per-core. This feature provides a method to > control applications which may be over-utilizing bandwidth relative to their priority > in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. > Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a > MB parameter to control how much memory bandwidth it can utilize in unit of percentage. > > In this series, MBA is enabled by enhancing existing virresctrl implementation. The > policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth > dose not exceed 100%. The enhancement of virresctrl include two parts: > > Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting > host system MBA capability and domain memory bandwidth allocation. > > Patch 2: On frontend XML parsing, add new element "llc" in cachetune section for > MBA allocation. Hi, Thanks for the patches. Before we start with the actual implementation it would be nice to agree on the design. ------------------------------------------------------------------------ So first point is that we should do it similarly as the cache allocation, we will not allow to "share" the bandwidth so the sum should be 100% as you already have that in your patches, but we need to do it in a way that in the future we can allow to "share" the bandwidth. Second point is how the XML will look like. There are two parts, one is the capabilities XML and second one is domain XML. It looks like that your patches don't expose any information in capabilities, we should do that in order to let management applications know that the feature is available and what are the possible values that they can use. ------------------------------------------------------------------------ I've tried to configure MBA on one machine that I have access to witch has this cpu: 'Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz' and it behaves strangely. If I configure 'schemata' the output of 'pqos -s' command is in some situations different: schemata pqos -s output MB:0=10 MBA COS0 => 10% available MB:0=20 MBA COS0 => 20% available MB:0=30 MBA COS0 => 30% available MB:0=40 MBA COS0 => 40% available MB:0=50 MBA COS0 => 50% available MB:0=60 MBA COS0 => 60% available MB:0=70 MBA COS0 => 90% available MB:0=80 MBA COS0 => 90% available MB:0=90 MBA COS0 => 90% available MB:0=100 MBA COS0 => 100% available If you look at the table you can see that for values 70-90 the pqos shows that the available bandwidth is 90%. Tested using Fedora 28: kernel-4.16.13-300.fc28.x86_64 intel-cmt-cat-1.2.0-2.fc28.x86_64 ------------------------------------------------------------------------ Since CAT (cache allocation technology) and MBA (memory bandwidth allocation) are unrelated and they are controlling different limitation we should not group MBA together with CAT in our XML files. From poor documentation it looks like that MBA is related to memory controller. Currently the cache allocation in capabilities XML is reported like this: <capabilities> <host> ... <cache> <bank id='0' level='3' type='both' size='30720' unit='KiB' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> </bank> <bank id='1' level='3' type='both' size='30720' unit='KiB' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> </bank> </cache> ... </host> </capabilities> So the possible capabilities XML could look like this: <capabilities> <host> ... <memory> <bank id='0' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> <control granularity='10' maxAllocs='8'/> </bank> <bank id='1' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> <control granularity='10' maxAllocs='8'/> </bank> </memory> ... </host> </capabilities> The element names 'memory' and 'bank' can be named differently, suggestions are welcome. Then there is the domain XML, for CAT we use this: <domain> ... <cputune> ... <cachetune vcpus='0-3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> ... <cputune> ... </domain> so the possible domain XML could look like this: <domain> ... <cputune> ... <memory vcpus='0-3'> <socket id='0' bandwidth='30'/> <socket id='1' bandwidth='20'/> </memory> ... <cputune> ... </domain> Again, the element names 'memory' and 'socket' can be named differently. Pavel
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list