Re: [PATCH 2/5] virCaps: expose huge page info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13.06.2014 10:28, Daniel P. Berrange wrote:
On Thu, Jun 12, 2014 at 07:21:47PM +0200, Martin Kletzander wrote:
On Thu, Jun 12, 2014 at 02:30:50PM +0100, Daniel P. Berrange wrote:
On Tue, Jun 10, 2014 at 07:21:12PM +0200, Michal Privoznik wrote:
There are two places where you'll find info on huge pages. The first
one is under <cpu/> element, where all supported huge page sizes are
listed. Then the second one is under each <cell/> element which refers
to concrete NUMA node. At this place, the size of huge page's pool is
reported. So the capabilities XML looks something like this:

<capabilities>

  <host>
    <uuid>01281cda-f352-cb11-a9db-e905fe22010c</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Westmere</model>
      <vendor>Intel</vendor>
      <topology sockets='1' cores='1' threads='1'/>
      ...
      <pages unit='KiB' size='1048576'/>
      <pages unit='KiB' size='2048'/>

Should have normal sized pages (ie 4k on x86) too, to avoid
apps having to special case small pages.


Since we have to special-case small pages and kernel (at least to my
knowledge) doesn't expose that information by classic means, I think
reporting only hugepages is actually what we want here.  For normal
memory there are existing APIs already.

Hugepages are different mainly because of one thing.  The fact that
there are some hugepages allocated is known by the user of the machine
(be it mgmt app or an admin) and these hugepages were allocated for
some purpose.  It is fairly OK to presume that the number of hugepages
(free or total) will change only when and if the user wants to
(e.g. running a machine with specified size and hugepages).  That
cannot be said about small pages, though, and I think it is fair
reason to special-case normal pages like this.

That difference is something that's only relevant to the person who
is provisioning the machine though. For applications consuming the
libvirt APIs it is not relevant. For OpenStack we really want to have
normal size pages dealt with the in the same way as huge pages since
it will simplify our schedular/placement logic. So I really want these
APIs to do this in libvirt so that OpenStack doesn't have to reverse
engineer this itself.

But if we go this way, there are black holes hidden. For instance, the sizeof(ordinary pages pool). This is not accessible anywhere and the only algorithm I can think of is to take [(MemTotal on NODE #i) - sum(mem taken by all huge pages)] / PAGE_SIZE. So for instance on my machine where I have 1GB huge page per NUMA node, and 3 2MB per NUMA node:

# grep MemTotal /sys/devices/system/node/node0/meminfo
Node 0 MemTotal:        4054408 kB

# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
1

# cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
3

# getconf PAGESIZE
4096

(4054408 - (1*1048576 + 3*2048)) / 4 = 2999688 / 4 = 749922 ordinary pages. But it's not that simple as not all pages are available. Some are reserved for DMA transfers, some for kernel itself, etc. Without overcommit it's impossible to allocate that nearly 3GB. Is this something we really want to do?

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]