On 3/3/22 6:20 AM, Ammar Faizi wrote:
On 3/3/22 12:26 AM, Yazen Ghannam wrote:
Hi Ammar,
Hi Yazen,
...
The threshold interrupt handler uses this pointer. I think the goal here is to
set this pointer when the list is fully formed and clear this pointer before
making any changes to the list. Otherwise, the interrupt handler will operate
on incomplete data if an interrupt comes in the middle of these updates.
The changes below should deal with memory leak issue while avoiding a race
with the threshold interrupt. What do you think?
Thanks for taking a look into this. I didn't notice that before. The
changes look good to me, extra improvements:
1) _mce_threshold_remove_device() should be static as we don't use it
in another translation unit.
2) Minor cleanup, we don't need "goto out_err", just early return
directly.
I will fold them in...
Please review the patch below, if you think it looks good, I will
send this for the v5 series. I added your sign-off.
From cae3965734a67d11a5286c612dfddf52398defc8 Mon Sep 17 00:00:00 2001
From: Ammar Faizi <ammarfaizi2@xxxxxxxxxxx>
Date: Thu, 3 Mar 2022 05:07:38 +0700
Subject: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
In mce_threshold_create_device(), when threshold_create_bank() fails,
the @bp will be leaked, because mce_threshold_remove_device() will
not free the @bp. It only frees the @bp when we've already written
the @bp to the @threshold_banks per-CPU variable, but at the point,
we haven't.
Fix this by extracting the cleanup part into a new static function
_mce_threshold_remove_device(), then use it from create and remove
device function.
Also, eliminate the "goto out_err". Just early return inside the loop
when we fail.
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # v5.8+
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Co-authored-by: Alviro Iskandar Setiawan <alviro.iskandar@xxxxxxxxxxx>
Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@xxxxxxxxxxx>
Co-authored-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Signed-off-by: Ammar Faizi <ammarfaizi2@xxxxxxxxxxx>
---
arch/x86/kernel/cpu/mce/amd.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9f4b508886dd..ac7246a4de08 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1293,10 +1293,22 @@ static void threshold_remove_bank(struct threshold_bank *bank)
kfree(bank);
}
+static void _mce_threshold_remove_device(struct threshold_bank **bp)
+{
+ unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
+
+ for (bank = 0; bank < numbanks; bank++) {
+ if (bp[bank]) {
+ threshold_remove_bank(bp[bank]);
+ bp[bank] = NULL;
+ }
+ }
+ kfree(bp);
+}
+
int mce_threshold_remove_device(unsigned int cpu)
{
struct threshold_bank **bp = this_cpu_read(threshold_banks);
- unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
if (!bp)
return 0;
@@ -1307,13 +1319,7 @@ int mce_threshold_remove_device(unsigned int cpu)
*/
this_cpu_write(threshold_banks, NULL);
- for (bank = 0; bank < numbanks; bank++) {
- if (bp[bank]) {
- threshold_remove_bank(bp[bank]);
- bp[bank] = NULL;
- }
- }
- kfree(bp);
+ _mce_threshold_remove_device(bp);
return 0;
}
@@ -1350,15 +1356,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (!(this_cpu_read(bank_map) & (1 << bank)))
continue;
err = threshold_create_bank(bp, cpu, bank);
- if (err)
- goto out_err;
+ if (err) {
+ _mce_threshold_remove_device(bp);
+ return err;
+ }
}
this_cpu_write(threshold_banks, bp);
if (thresholding_irq_en)
mce_threshold_vector = amd_threshold_interrupt;
return 0;
-out_err:
- mce_threshold_remove_device(cpu);
- return err;
}
--
Ammar Faizi