On 3/28/22 5:52 AM, Borislav Petkov wrote:
[...]
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
How did you decide this is the commit that this is fixing?
I examined the history in those lines by git blame. Will recheck after the below
doubt is cleared.
Link: https://lore.kernel.org/lkml/9dfe087a-f941-1bc4-657d-7e7c198888ff@xxxxxxxxxxx
That Link tag is not needed.
Co-authored-by: Alviro Iskandar Setiawan <alviro.iskandar@xxxxxxxxxxx>
Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@xxxxxxxxxxx>
Co-authored-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
There's no "Co-authored-by".
The correct tag is described in
Documentation/process/submitting-patches.rst
Will fix them in the v6.
...
@@ -1350,15 +1357,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (!(this_cpu_read(bank_map) & (1 << bank)))
continue;
err = threshold_create_bank(bp, cpu, bank);
- if (err)
- goto out_err;
+ if (err) {
+ _mce_threshold_remove_device(bp, numbanks);
+ return err;
+ }
}
this_cpu_write(threshold_banks, bp);
Do I see it correctly that the publishing of the @bp pointer - i.e.,
this line - should be moved right above the for loop?
Then mce_threshold_remove_device() would properly free it in the error
case and your patch turns into a oneliner?
Previously, in v4 I did that too. But after discussion with Yazen, we got a
conclusion that placing `this_cpu_write(threshold_banks, bp);` before the for loop
is not the right thing to do.
And then your Fixes: tag would be correct too...
The reason is based on the discussion with Yazen, the full discussion can be read in
the Link tag above.
==================
The point is:
On Wed, 2 Mar 2022 17:26:32 +0000, Yazen Ghannam <yazen.ghannam@xxxxxxx> wrote:
The threshold interrupt handler uses this pointer. I think the goal here is to
set this pointer when the list is fully formed and clear this pointer before
making any changes to the list. Otherwise, the interrupt handler will operate
on incomplete data if an interrupt comes in the middle of these updates.
==================
Also, looking at the comment in mce_threshold_remove_device() function:
/*
* Clear the pointer before cleaning up, so that the interrupt won't
* touch anything of this.
*/
this_cpu_write(threshold_banks, NULL);
I think it's reasonable to place `this_cpu_write(threshold_banks, bp);` after
the "for loop" on the creation process for the similar reason. In short, don't
let the interrupt sees incomplete data.
Although, I am not sure if that 100% guarantees mce_threshold_remove_device()
will not mess up with the interrupt (e.g. freeing the data while the interrupt
reading it), unless we're using RCU stuff.
What do you think?
--
Ammar Faizi