> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx] > Sent: Saturday, January 14, 2012 6:25 AM > > > > > sure. VCPU!=PCPU requirement is orthogonal to the basic part for gearing > > > > ACPI information to Xen. > > .. snip.. > > > > > > > > > 1). For new distros (Ubuntu, Fedora), the default is all VCPUs. > > > > > > > > good to know that. > > > > > > > > > Enterprising users might use dom0_max_vcpus to limit the VCPU > > > count, > > > > > but most won't. > > > > > Which mean we can concentrate on bringing the _Pxx/_Cxx > parsing > > > > > up to the hypervisor. Which is really neccessary on any chipset > > > > > which has the notion of TurboBoost (otherwise the Xen scheduler > > > > > won't pick this up and won't engage this mode in certain > > > > > workloads). > > .. snip.. > > > yes, this is a big question. First, I'd like to give my sincere thanks to you and > > Liang for help push this part to Linux upstream. You've done a great job. :-) > > Thanks! > > Unfortunately I can't afford the time in the short term, due to extremely busy > > tasks in other projects, at least in the whole Q1. Really sorry about that. :/ > > Bummer. > > > > I would very appreciate your help if you could continue lending some time > here, > > since you've done plenty of works on the cleanup. The majority of the tricky > > part is related to VCPU/PCPU handling. If putting that part aside, the > remaining > > logic should be much simpler. > > I was trying to figure out how difficult it would be to just bring Pxx states to > the Xen hypervisor using the existing ACPI interfaces. And while it did not pass > all the _Pxx states (seems that all the _PCT, _PSS, _PSD, _PPC flags need to > be enabled in the hypercall to make this work), it demonstrates what I had in > mind. > > > #include <linux/device.h> > #include <linux/kernel.h> > #include <linux/module.h> > #include <linux/init.h> > #include <linux/types.h> > #include <acpi/acpi_bus.h> > #include <acpi/acpi_drivers.h> > #include <acpi/processor.h> > #include <linux/cpumask.h> > > #include <xen/interface/platform.h> > #include <asm/xen/hypercall.h> > > #define DRV_NAME "ACPI_PXX" > #define DRV_CLASS "ACPI_PXX_CLASS" > MODULE_AUTHOR("Konrad Rzeszutek Wilk"); > MODULE_DESCRIPTION("ACPI Processor Driver to send data to Xen > hypervisor"); > MODULE_LICENSE("GPL"); > > static int parse_acpi_cxx(struct acpi_processor *_pr) > { > struct acpi_processor_cx *cx; > int i; > > for (i = 1; i <= _pr->power.count; i++) { > cx = &_pr->power.states[i]; > if (!cx->valid) > continue; > pr_info("%s: %d %d %d 0x%x\n", __func__, > cx->type, cx->latency, cx->power, (u32)cx->address); > } > /* TODO: Under Xen, the C-states information is not present. > * Figure out why. */ it's possible related to this long thread: http://lists.xen.org/archives/html/xen-devel/2011-08/msg00511.html IOW, Xen doesn't export mwait capability to dom0, which impacts _PDC setting. Final solution is to have a para-virtualized PDC call for that. > return 0; > } > static int push_pxx_to_hypervisor(struct acpi_processor *_pr) > { > int ret = -EINVAL; > struct xen_platform_op op = { > .cmd = XENPF_set_processor_pminfo, > .interface_version = XENPF_INTERFACE_VERSION, > .u.set_pminfo.id = _pr->acpi_id, > .u.set_pminfo.type = XEN_PM_PX, > }; > struct xen_processor_performance *xen_perf; > struct xen_processor_px *xen_states, *xen_px = NULL; > struct acpi_processor_px *px; > unsigned i; > > xen_perf = &op.u.set_pminfo.perf; > xen_perf->flags = XEN_PX_PSS; > > xen_perf->state_count = _pr->performance->state_count; > xen_states = kzalloc(xen_perf->state_count * > sizeof(struct xen_processor_px), GFP_KERNEL); > if (!xen_states) > return -ENOMEM; > > for (i = 0; i < _pr->performance->state_count; i++) { > xen_px = &(xen_states[i]); > px = &(_pr->performance->states[i]); > > xen_px->core_frequency = px->core_frequency; > xen_px->power = px->power; > xen_px->transition_latency = px->transition_latency; > xen_px->bus_master_latency = px->bus_master_latency; > xen_px->control = px->control; > xen_px->status = px->status; > } > set_xen_guest_handle(xen_perf->states, xen_states); > ret = HYPERVISOR_dom0_op(&op); > kfree(xen_states); > return ret; > } > static int parse_acpi_pxx(struct acpi_processor *_pr) > { > struct acpi_processor_px *px; > int i; > > for (i = 0; i < _pr->performance->state_count;i++) { > px = &(_pr->performance->states[i]); > pr_info("%s: [%d]: %d, %d, %d, %d, %d, %d\n", __func__, > i, (u32)px->core_frequency, (u32)px->power, > (u32)px->transition_latency, > (u32)px->bus_master_latency, > (u32)px->control, (u32)px->status); > } > if (xen_initial_domain()) > return push_pxx_to_hypervisor(_pr); > return 0; > } > static int parse_acpi_data(void) > { > int cpu; > int err = -ENODEV; > struct acpi_processor *_pr; > struct cpuinfo_x86 *c = &cpu_data(0); > > /* TODO: Under AMD, the information is populated > * using the powernow-k8 driver which does an MSR_PSTATE_CUR_LIMIT > * MSR which returns the wrong value so the population of 'processors' > * has bogus data. So only run this under Intel for right now. */ > if (!cpu_has(c, X86_FEATURE_EST)) > return -ENODEV; > for_each_possible_cpu(cpu) { > _pr = per_cpu(processors, cpu); > if (!_pr) > continue; > > if (_pr->flags.power) > (void)parse_acpi_cxx(_pr); > > if (_pr->performance->states) > err = parse_acpi_pxx(_pr); > if (err) > break; > } > return -ENODEV; /* force it to unload */ > } > static int __init acpi_pxx_init(void) > { > return parse_acpi_data(); > } > static void __exit acpi_pxx_exit(void) > { > } > module_init(acpi_pxx_init); > module_exit(acpi_pxx_exit); the prerequisites for this module to work correctly, is that dom0 has the right configurations to have all necessary Cx/Px information ready before this module is loaded. That may mean enabling full CONFIG_CPU_IDLE and CONFIG_CPUFREQ, which in current form may add some negative impact, e.g. dom0 will try to control Px/Cx to conflict with Xen. So some tweaks may be required in that part. given our purpose now, is to come up a cleaner approach which tolerate some assumptions (e.g. #VCPU of dom0 == #PCPU), there's another option following this trend (perhaps compensate your idea). We can register a Xen-cpuidle and xen-cpufreq driver to current Linux cpuidle and cpufreq framework, which plays mainly two roles: - a dummy driver to prevent dom0 touching actual Px/Cx - parse ACPI Cx/Px information to Xen, in a similar way you did above there may have some other trickiness, but the majority code will be self-contained. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html