On Wed, Jun 06, 2018 at 01:56:53PM +0800, bing.niu wrote: > Hi Pavel, > Thanks for your valuable inputs here. please see my respond. > > On 2018年06月05日 20:11, Pavel Hrdina wrote: > > On Tue, May 29, 2018 at 06:58:01PM +0800, bing.niu@xxxxxxxxx wrote: > > > From: Bing Niu <bing.niu@xxxxxxxxx> > > > > > > This series is to introduce RDT memory bandwidth allocation support by extending > > > current virresctrl implementation. > > > > > > The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate > > > control over memory bandwidth available per-core. This feature provides a method to > > > control applications which may be over-utilizing bandwidth relative to their priority > > > in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. > > > Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a > > > MB parameter to control how much memory bandwidth it can utilize in unit of percentage. > > > > > > In this series, MBA is enabled by enhancing existing virresctrl implementation. The > > > policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth > > > dose not exceed 100%. The enhancement of virresctrl include two parts: > > > > > > Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting > > > host system MBA capability and domain memory bandwidth allocation. > > > > > > Patch 2: On frontend XML parsing, add new element "llc" in cachetune section for > > > MBA allocation. > > > > Hi, > > > > Thanks for the patches. Before we start with the actual implementation > > it would be nice to agree on the design. > Total agree. The RFC code acts as baseline for discuss. > > > > ------------------------------------------------------------------------ > > > > So first point is that we should do it similarly as the cache > > allocation, we will not allow to "share" the bandwidth so the sum should > > be 100% as you already have that in your patches, but we need to do it > > in a way that in the future we can allow to "share" the bandwidth. > > Yes, the memory bandwidth allocation policy is derived from existing CAT > in libvirt. no share or overlap. In the patch, I follow the existing CAT > behavior. When allocating memory bandwidth. First, calculate the unused > memory bandwidth by subtracting all existing RDT groups. If we want to > enable memory bandwidth sharing. We can just simply skip this part and do > allocation directly. > Could this fit your comment " we need to do it in a way that in the future > we can allow to "share" the bandwidth."? > If there is anything missing or my understanding incorrect, Please point me > out. :) Sounds good to me. > > Second point is how the XML will look like. There are two parts, one is > > the capabilities XML and second one is domain XML. > > > > It looks like that your patches don't expose any information in > > capabilities, we should do that in order to let management applications > > know that the feature is available and what are the possible values that > > they can use. > > > > ------------------------------------------------------------------------ > > > > I've tried to configure MBA on one machine that I have access to witch > > has this cpu: 'Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz' and it behaves > > strangely. If I configure 'schemata' the output of 'pqos -s' command is > > in some situations different: > > > > schemata pqos -s output > > > > MB:0=10 MBA COS0 => 10% available > > MB:0=20 MBA COS0 => 20% available > > MB:0=30 MBA COS0 => 30% available > > MB:0=40 MBA COS0 => 40% available > > MB:0=50 MBA COS0 => 50% available > > MB:0=60 MBA COS0 => 60% available > > MB:0=70 MBA COS0 => 90% available > > MB:0=80 MBA COS0 => 90% available > > MB:0=90 MBA COS0 => 90% available > > MB:0=100 MBA COS0 => 100% available > > If you look at the table you can see that for values 70-90 the pqos > > shows that the available bandwidth is 90%. > > > > Tested using Fedora 28: > > kernel-4.16.13-300.fc28.x86_64 > > intel-cmt-cat-1.2.0-2.fc28.x86_64 > > > hmm.., that is strange. I directly manipulate resctrl fs. So I didn't hit > such kind of issue. I will take a look at this pqos package and let you > know. Yes, I was directly manipulating the resctrl fs as well and after the modification to schemata file the content was correct, however, I wanted to validate what was actually configured and in order to do that I used 'pqos' tool. The question now is whether that tool is broken or if you configure some values the actual configuration is different. > > ------------------------------------------------------------------------ > > > > Since CAT (cache allocation technology) and MBA (memory bandwidth > > allocation) are unrelated and they are controlling different limitation > > we should not group MBA together with CAT in our XML files. From poor > > documentation it looks like that MBA is related to memory controller. > From Intel sdm 17.19. MBA used to control the request rate for flushing data > from llc to memory, usually MBA and llc have a 1:1 mapping relation. Yes, I > miss exposing capability part. Thanks for pointing out. > > > > Currently the cache allocation in capabilities XML is reported like > > this: > > > > <capabilities> > > <host> > > ... > > <cache> > > <bank id='0' level='3' type='both' size='30720' unit='KiB' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> > > <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> > > <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> > > </bank> > > <bank id='1' level='3' type='both' size='30720' unit='KiB' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> > > <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> > > <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> > > </bank> > > </cache> > > ... > > </host> > > </capabilities> > > > > So the possible capabilities XML could look like this: > > > > <capabilities> > > <host> > > ... > > <memory> > > <bank id='0' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> > > <control granularity='10' maxAllocs='8'/> > > </bank> > > <bank id='1' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> > > <control granularity='10' maxAllocs='8'/> > > </bank> > > </memory> > > ... > > </host> > > </capabilities> > > > > The element names 'memory' and 'bank' can be named differently, > > suggestions are welcome. > How about change bank to node? Definitely can be 'node', this was just to make it consistent with cache allocation. > > Then there is the domain XML, for CAT we use this: > > > > <domain> > > ... > > <cputune> > > ... > > <cachetune vcpus='0-3'> > > <cache id='0' level='3' type='both' size='3' unit='MiB'/> > > <cache id='1' level='3' type='both' size='3' unit='MiB'/> > > </cachetune> > > ... > > <cputune> > > ... > > </domain> > > > > so the possible domain XML could look like this: > > > > <domain> > > ... > > <cputune> > > ... > > <memory vcpus='0-3'> > > <socket id='0' bandwidth='30'/> > > <socket id='1' bandwidth='20'/> > > </memory> > > ... > > <cputune> > > ... > > </domain> > > > > Again, the element names 'memory' and 'socket' can be named differently. > socket --> node? Same here, I'm OK with 'node'. > Since the existing virrestrl implementation only care about cache part > during development, So we may need change some names of structure and > functions when enable MBA. How do you think Yes, the existing virresctrl implementation was mainly focused to cache allocation. There is a new patch series with some cleanups for the existing implementation which renames few things but we will probably need to rename some other functions/structures/etc. One important note, the rename should be done in separate patch without any functional changes. Pavel
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list