Re: [PATCH v6 08/11] platform/x86/intel/ifs: Add scan test support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 05 2022 at 18:40, Tony Luck wrote:
> +/*
> + * Note all code and data in this file is protected by
> + * ifs_sem. On HT systems all threads on a core will
> + * execute together, but only the first thread on the
> + * core will update results of the test.
> + */
> +struct workqueue_struct *ifs_wq;

Seems to be unused.

> +static bool oscan_enabled = true;

What changes this?

> +static void message_not_tested(struct device *dev, int cpu, union ifs_status status)
> +{
> +	if (status.error_code < ARRAY_SIZE(scan_test_status))

Please add curly brackets as these are not one-line statements.

> +		dev_info(dev, "CPU(s) %*pbl: SCAN operation did not start. %s\n",
> +			 cpumask_pr_args(topology_sibling_cpumask(cpu)),
> +			 scan_test_status[status.error_code]);
> +/*
> + * Execute the scan. Called "simultaneously" on all threads of a core
> + * at high priority using the stop_cpus mechanism.
> + */
> +static int doscan(void *data)
> +{
> +	int cpu = smp_processor_id();
> +	u64 *msrs = data;
> +	int first;
> +
> +	/* Only the first logical CPU on a core reports result */
> +	first = cpumask_first(topology_sibling_cpumask(cpu));

Shouldn't that be cpu_smt_mask()?

> +	/*
> +	 * This WRMSR will wait for other HT threads to also write
> +	 * to this MSR (at most for activate.delay cycles). Then it
> +	 * starts scan of each requested chunk. The core scan happens
> +	 * during the "execution" of the WRMSR. This instruction can
> +	 * take up to 200 milliseconds before it retires.

200ms per test chunk?

> +	 */
> +	wrmsrl(MSR_ACTIVATE_SCAN, msrs[0]);
> +

> +	while (activate.start <= activate.stop) {
> +		if (time_after(jiffies, timeout)) {
> +			status.error_code = IFS_SW_TIMEOUT;
> +			break;
> +		}
> +
> +		msrvals[0] = activate.data;
> +		stop_core_cpuslocked(cpu, doscan, msrvals);
> +
> +		status.data = msrvals[1];
> +
> +		/* Some cases can be retried, give up for others */
> +		if (!can_restart(status))
> +			break;
> +
> +		if (status.chunk_num == activate.start) {
> +			/* Check for forward progress */
> +			if (retries-- == 0) {
> +				if (status.error_code == IFS_NO_ERROR)
> +					status.error_code = IFS_SW_PARTIAL_COMPLETION;
> +				break;
> +			}
> +		} else {
> +			retries = MAX_IFS_RETRIES;
> +			activate.start = status.chunk_num;
> +		}
> +	}

Looks way better now.

> +}
> +/*
> + * Initiate per core test. It wakes up work queue threads on the target cpu and
> + * its sibling cpu. Once all sibling threads wake up, the scan test gets executed and
> + * wait for all sibling threads to finish the scan test.
> + */
> +int do_core_test(int cpu, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	if (!scan_enabled)
> +		return -ENXIO;
> +
> +	/* Prevent CPUs from being taken offline during the scan test */
> +	cpus_read_lock();
> +
> +	if (!cpu_online(cpu)) {
> +		dev_info(dev, "cannot test on the offline cpu %d\n", cpu);
> +		ret = -EINVAL;
> +		goto out;
> +	}

Coming back to my points from the previous round:

1) How is that supposed to work on a system which has HT enabled in BIOS,
   but disabled on the kernel command line or via /sys/..../smt/control or
   when a HT sibling is offlined temporarily?

   I assume it cannot work, but I can't see anything which handles those
   cases.

2) That documentation for the admin/user got eaten by the gremlins in
   the intertubes again.

Thanks,

        tglx



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux