Designing XML for HMAT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear list,

QEMU gained support for configuring HMAT recently (see v4.2.0-415-g9b12dfa03a and friends). HMAT stands for Heterogeneous Memory Attribute Table and defines
various attributes to NUMA. Guest OS/app can read these information and fine
tune optimization. See [1] for more info (esp. links in the transcript).

QEMU defines so called initiator, which is an attribute to a NUMA node and if
specified points to another node that has the best performance to this node.

For instance:

  -machine hmat=on \
  -m 2G,slots=2,maxmem=4G \
  -object memory-backend-ram,size=1G,id=m0 \
  -object memory-backend-ram,size=1G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1,initiator=0 \
  -smp 2,sockets=2,maxcpus=2 \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1

creates a machine with 2 NUMA nodes, node 0 has CPUs and node 1 has memory only
and it's initiator is node 0 (yes, HMAT allows you to create CPU-less "NUMA"
nodes). The initiator of node 0 is not specified, but since the node has at
least one CPU it is initiator to itself (and has to be per specs).

This could be represented by an attribute to our /domain/cpu/numa/cell element.
For instance like this:

  <domain>
    <vcpu>2</vcpu>
    <cpu>
      <numa>
        <cell id='0' cpus='0,1' memory='1' unit='GiB'/>
        <cell id='1'            memory='1' unit='GiB' initiator='0'/>
      </numa>
    </cpu>
  </domain>


Then, QEMU allows us to control two other important memory attributes:

  1) hmat-lb for Latency and Bandwidth

  2) hmat-cache for cache attributes

For example:

  -machine hmat=on \
  -m 2G,slots=2,maxmem=4G \
  -object memory-backend-ram,size=1G,id=m0 \
  -object memory-backend-ram,size=1G,id=m1 \
  -smp 2,sockets=2,maxcpus=2 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1,initiator=0 \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5 \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10 \ -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ -numa hmat-cache,node-id=0,size=10K,level=1,associativity=direct,policy=write-back,line=8 \ -numa hmat-cache,node-id=1,size=10K,level=1,associativity=direct,policy=write-back,line=8

This extends previous example by defining some latencies and cache attributes. The node 0 has access latency of 5 ns and bandwidth of 200MB/s and node 1 has access latency of 10ns and bandwidth of only 100MB/s. The memory cache level 1 on both nodes is 10KB, cache line is 8B long with write-back policy and direct
associativity (whatever that means).

For better future extensibility I'd express these as separate elements, rather
than attributes to <cell/> element. For instance like this:

  <domain>
    <vcpu>2</vcpu>
    <cpu>
      <numa>
        <cell id='0' cpus='0,1' memory='1' unit='GiB'>
          <latencies>
            <latency type='access' value='5'/>
            <bandwidth type='access' unit='MiB' value='200'/>
          </latencies>
          <caches>
            <cache level='1' associativity='direct' policy='write-back'>
              <size unit='KiB' value='10'/>
              <line unit='B' value='8'/>
            </cache>
          </caches>
        </cell>
        <cell id='1' memory='1' unit='GiB' initiator='0'>
          <latencies>
            <latency type='access' value='10'/>
            <bandwidth type='access' unit='MiB' value='100'/>
          </latencies>
          <caches>
            <cache level='1' associativity='direct' policy='write-back'>
              <size unit='KiB' value='10'/>
              <line unit='B' value='8'/>
            </cache>
          </caches>
        </cell>
      </numa>
    </cpu>
  </domain>

Thing is, the @hierarchy argument accepts: memory (referring to whole memory),
or first-level|second-level|third-level (referring to side caches for each
domain). I haven't figured out yet, how to express the levels in XML yet.

The @data-type argument accepts access|read|write (this is expressed by @type attribute to <latency/> and <bandwidth/> elements). Latency and bandwidth can
be combined with each type: access-latency, read-latency, write-latency,
access-bandwidth, read-bandwidth, write-bandwidth. And these 6 can then be
combined with aforementioned @hierarchy, producing 24 combinations (if I read
qemu cmd line specs correctly [2]).



What are your thoughts?

Michal


1: https://bugzilla.redhat.com/show_bug.cgi?id=1786303
2: https://git.qemu.org/?p=qemu.git;a=blob;f=qemu-options.hx;h=d4b73ef60c1d4589148169ac658a34eee5f54522;hb=HEAD#l174

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux