On Thu, Jan 17, 2013 at 12:12:35AM +0100, Peter Krempa wrote: > On 01/16/13 21:24, Daniel P. Berrange wrote: > >On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote: > >>On 01/16/2013 04:30 PM, Daniel P. Berrange wrote: > >>>On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote: > >>>>----- Original Message ----- > >>>>From: Daniel P. Berrange <berrange@xxxxxxxxxx> > >>>>To: Peter Krempa <pkrempa@xxxxxxxxxx> > >>>>Cc: Jiri Denemark <jdenemar@xxxxxxxxxx>, Amador Pahim <apahim@xxxxxxxxxx>, libvirt-list@xxxxxxxxxx, dougsland@xxxxxxxxxx > >>>>Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST) > >>>>Subject: Re: [RFC] Data in the <topology> element in the capabilities XML > >>>> > >>>>On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: > >>>>>On 01/16/13 19:11, Daniel P. Berrange wrote: > >>>>>>On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: > >>>>>>>Hi everybody, > >>>>>>> > >>>>>>>a while ago there was a discussion about changing the data that is > >>>>>>>returned in the <topology> sub-element: > >>>>>>> > >>>>>>><capabilities> > >>>>>>><host> > >>>>>>><cpu> > >>>>>>><arch>x86_64</arch> > >>>>>>><model>SandyBridge</model> > >>>>>>><vendor>Intel</vendor> > >>>>>>><topology sockets='1' cores='2' threads='2'/> > >>>>>>> > >>>>>>> > >>>>>>>The data provided here is as of today taken from the nodeinfo > >>>>>>>detection code and thus is really wrong when the fallback mechanisms > >>>>>>>are used. > >>>>>>> > >>>>>>>To get a useful count, the user has to multiply the data by the > >>>>>>>number of NUMA nodes in the host. With the fallback detection code > >>>>>>>used for nodeinfo the NUMA node count used to get the CPU count > >>>>>>>should be 1 instead of the actual number. > >>>>>>> > >>>>>>>As Jiri proposed, I think we should change this output to separate > >>>>>>>detection code that will not take into account NUMA nodes for this > >>>>>>>output and will rather provide data as the "lspci" command does. > >>>>>>> > >>>>>>>This change will make the data provided by the element standalone > >>>>>>>and also usable in guest XMLs to mirror host's topology. > >>>>>>Well there are 2 parts which need to be considered here. What do we report > >>>>>>in the host capabilities, and how do you configure guest XML. > >>>>>> > >>>>>> From a historical compatibility pov I don't think we should be changing > >>>>>>the host capabilities at all. Simply document that 'sockets' is treated > >>>>>>as sockets-per-node everywhere, and that it is wrong in the case of > >>>>>>machines where an socket can internally have multiple NUMA nodes. > >>>>>I'm too somewhat concerned about changing this output due to > >>>>>historic reasons. > >>>>>>Apps should be using the separate NUMA <topology> data in the capabilities > >>>>>>instead of the CPU <topology> data, to get accurate CPU counts. > >>>>> From the NUMA <topology> the management apps can't tell if the CPU > >>>>>is a core or a thread. For example oVirt/VDSM bases the decisions on > >>>>>this information. > >>>>Then, we should add information to the NUMA topology XML to indicate > >>>>which of the child <cpu> elements are sibling cores or threads. > >>>> > >>>>Perhaps add a 'socket_id' + 'core_id' attribute to every <cpu>. > >>> > >>>>In this case, we will also need to add the thread siblings and > >>>>perhaps even core siblings information to allow reliable detection. > >>>The combination fo core_id/socket_id lets you determine that. If two > >>>core have the same socket_id then they are cores or threads within the > >>>same socket. If two <cpu> have the same socket_id & core_id then they > >>>are threads within the same core. > >> > >>Not true to AMD Magny-Cours 6100 series, where different cores can > >>share the same physical_id and core_id. And they are not threads. > >>This processors has two numa nodes inside the same "package" (aka > >>socket) and they shares the same core ID set. Annoying. > > > >I don't believe there's a problem with that. This example XML > >shows a machine with 4 NUMA nodes, 2 sockets each containing > >2 cores, and 2 threads, giving 16 logical CPUs > > > > <topology> > > <cells num='4'> > > <cell id='0'> > > <cpus num='4'> > > <cpu id='0' socket_id='0' core_id='0'/> > > <cpu id='1' socket_id='0' core_id='0'/> > > <cpu id='2' socket_id='0' core_id='1'/> > > <cpu id='3' socket_id='0' core_id='1'/> > > </cpus> > > </cell> > > <cell id='1'> > > <cpus num='2'> > > <cpu id='4' socket_id='0' core_id='0'/> > > <cpu id='5' socket_id='0' core_id='0'/> > > <cpu id='6' socket_id='0' core_id='1'/> > > <cpu id='7' socket_id='0' core_id='1'/> > > </cpus> > > </cell> > > <cell id='2'> > > <cpus num='2'> > > <cpu id='8' socket_id='1' core_id='0'/> > > <cpu id='9' socket_id='1' core_id='0'/> > > <cpu id='10' socket_id='1' core_id='1'/> > > <cpu id='11' socket_id='1' core_id='1'/> > > </cpus> > > </cell> > > <cell id='3'> > > <cpus num='2'> > > <cpu id='12' socket_id='1' core_id='0'/> > > <cpu id='13' socket_id='1' core_id='0'/> > > <cpu id='14' socket_id='1' core_id='1'/> > > <cpu id='15' socket_id='1' core_id='1'/> > > </cpus> > > </cell> > > </cells> > > </topology> > > > >I believe there's enough info there to determine all the co-location > >aspects of all the sockets/core/threads involved. > > Well not for all machines in the wild out there. This is a very > similar approach that libvirt uses now to detect the topology and it > is not enough to detect threads on AMD Bulldozer as the cpus > corresponding to the threads have different core_id's (they are also > considered as cores from the perspective of the kernel). This is > unfortunate for the virtualization management tools as oVirt that > still consider the AMD Bulldozer "module" as a 1 core with two > threads, even if it registers as two cores. > > For AMD Bulldozer to be detected correctly, we would need to expose > the thread_id's along with thread siblings information to determine > the two threads belonging together. NB, the socket_id / core_id values in the above XML are *not* intended to be anyway related to similarly named values in /proc/cpuinfo. They are values libvirt assigns to show the topology accurately. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list