On Tuesday, November 3, 2020 1:52:28 P.M. EST Mark Asselstine wrote: > On Tuesday, November 3, 2020 1:28:14 P.M. EST Kaneda, Erik wrote: > > > -----Original Message----- > > > From: Mark Asselstine <mark.asselstine@xxxxxxxxxxxxx> > > > Sent: Monday, November 2, 2020 10:51 AM > > > To: Rafael J. Wysocki <rafael@xxxxxxxxxx>; Kaneda, Erik > > > <erik.kaneda@xxxxxxxxx> > > > Cc: ACPI Devel Maling List <linux-acpi@xxxxxxxxxxxxxxx>; Moore, Robert > > > <robert.moore@xxxxxxxxx> > > > Subject: Re: [PATCH] ACPICA: avoid double free when object already has a > > > zero reference count > > > > > > On Monday, November 2, 2020 1:11:22 P.M. EST Kaneda, Erik wrote: > > > > > -----Original Message----- > > > > > From: Rafael J. Wysocki <rafael@xxxxxxxxxx> > > > > > Sent: Thursday, October 29, 2020 7:06 AM > > > > > To: Mark Asselstine <mark.asselstine@xxxxxxxxxxxxx> > > > > > Cc: ACPI Devel Maling List <linux-acpi@xxxxxxxxxxxxxxx>; Kaneda, > > > > > Erik > > > > > <erik.kaneda@xxxxxxxxx>; Moore, Robert <robert.moore@xxxxxxxxx> > > > > > Subject: Re: [PATCH] ACPICA: avoid double free when object already > > > > > has > > > > > > a > > > > > > > > zero reference count > > > > > > > > > > > > > > > > > > > > +Erik and Bob > > > > > > > > > > > > > > > > > > > > On Thu, Oct 29, 2020 at 3:05 AM Mark Asselstine > > > > > > > > > > <mark.asselstine@xxxxxxxxxxxxx> wrote: > > > > > > The first trip into acpi_ut_update_ref_count() for an object where > > > > > > 'object->common.reference_count' is 1 and we are performing a > > > > > > REF_DECREMENT will result in 'new_count' being 0 and thus the > > > > > > object > > > > > > is deleted via acpi_ut_delete_internal_obj(). > > > > > > > > > > > > > > > > > > > > > > > > If for some reason we make a subsequent trip into > > > > > > acpi_ut_update_ref_count() with the same object, > > > > > > object->common.reference_count' will be 0 and performing a > > > > > > REF_DECREMENT will produce a warning msg "Reference Count is > > > > > > already > > > > > > > > > zero, cannot decrement", 'new_count' will again be 0 and the > > > > > > already > > > > > > deleted object will be attempted to be deleted again via > > > > > > acpi_ut_delete_internal_obj(). > > > > > > > > Mark, Do you have an example of AML/ASL that you used to determine > > > > > > this > > > > > > > double free? > > > > > > Unfortunately no. It is a rare occurance and a consequence of several > > > actions taking place at the same time during boot, including a PCI > > > rescan. Unfortunately due to circumstances I am sure you would rather > > > not > > > have to be > > > concerned about, we have so far had to focus our efforts on an older > > > kernel > > > which also has the preempt-rt patchset applied. It is for this reason I > > > also didn't include the dmesg and eventual kernel BUG_ON in my > > > submission. > > > > > > It is unclear at this time if the additional locking or other changes > > > that > > > have been merged since the kernel version we are on would prevent this > > > from > > > occuring on an up to date kernel. > > > > > > I have reviewed the code in the latest linux kernel and as far as I can > > > tell the deficiency is still present. If you do go into > > > acpi_ut_update_ref_count() with an object with a reference count of 0 > > > and > > > an action of REF_DECREMENT the > > > following may be called: > > > > > > acpi_ut_update_ref_count > > > > > > -> acpi_ut_delete_internal_obj > > > > > > -> acpi_ut_delete_object_desc > > > > > > -> acpi_os_release_object > > > > > > -> kmem_cache_free > > > > > > I completely understand if you have concerns about the change since I > > > can't > > > hand you a reproducer. I was hoping the merrits of the change would > > > stand > > > on > > > their own as there is no reason to call acpi_ut_delete_internal_obj() if > > > we > > > have already done so, even if the rest of the call chain was well > > > behaved. > > > In our case the eventual slab corruption ended up affecting the > > > 'Acpi-Operand' dedicated cache. We see the corruption happen shortly > > > after we see the msg warning that the reference count is already at > > > zero. > > > > > > Please let me know if there is anything else I can provide to help out > > > and > > > thanks for your time reviewing this change. > > > > Since this is such a sensitive part of the codebase, I would like to see > > if > > it's possible to come up with ASL that reproduces this on our userspace > > interpreter (acpiexec). > > > > I can try reproduce this from your explanations but it would be helpful to > > also get an acpidump and dmesg so that we can understand which named > > objects are causing this issue. This gives us a clue on how to reproduce > > this in user space. > > > > Could you provide an acpidump of the machine and 2 dmesg logs? I need one > > normal dmesg that shows this error and a custom kernel built with > > CONFIG_ACPI_DEBUG=y and booting with the following commandline parameters: > > > > acpi.debug_level=0x80000 acpi.debug_layer=0xffffffff > > This should be possible. Give me a day or two to put something together. > Again my intention, as when dealing with any maintainer, is to be sensitive > of your time and given the kernel version skew and such I am not sure how > well it will map to the latest kernel code. But if you can pull out some > clues to help with this mapping I am not going to impede your offer of > assistance. I have supplied what I could directly to the Intel folks. Unfortunately, as I had stated previously our reproduction case uses an older kernel so can't easily map to the current state of code in mainline. I go back to my original assertion that the same condition which is being caught by the warning should be used to prevent a second attempt at deleting an already deleted object. Attempting a second deletion has been observed to be troublesome as the underlying memory could now be in used by a valid object and slab corruption is highly likely. I have also supplied a github pull request to the acpica repo as this issue could also manifest itself negatively in other OSes. https://github.com/acpica/acpica/pull/652 Mark > > Thanks, > MarkA > > > Thanks, > > Erik > > > > > Thanks, > > > Mark > > > > > > > Thanks, > > > > Erik > > > > > > > > > > Since the object deletion doesn't NULL the object the calls to > > > > > > acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(), > > > > > > acpi_os_release_object(), kmem_cache_free() will operate on the > > > > > > object > > > > > > > > > as if it hasn't been deleted. In many cases this can result in no > > > > > > issues, but if you are using the slab and a new object has been > > > > > > created with the same address this can be the cause slab > > > > > > corruption. > > > > > > > > > > > > > > > > > > > > > > > > Adding a check if we are decrementing to 0 for the first time and > > > > > > only > > > > > > calling acpi_ut_delete_internal_obj() in this case will prevent > > > > > > another attempt at deleting the object. > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Mark Asselstine <mark.asselstine@xxxxxxxxxxxxx> > > > > > > --- > > > > > > > > > > > > drivers/acpi/acpica/utdelete.c | 4 ++-- > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/acpi/acpica/utdelete.c > > > > > > b/drivers/acpi/acpica/utdelete.c > > > > > > index 4c0d4e434196..c6b860fd9eb5 > > > > > > > > > 100644 > > > > > > --- a/drivers/acpi/acpica/utdelete.c > > > > > > +++ b/drivers/acpi/acpica/utdelete.c > > > > > > @@ -421,9 +421,9 @@ acpi_ut_update_ref_count(union > > > > > > > > > > acpi_operand_object *object, u32 action) > > > > > > > > > > > ACPI_GET_FUNCTION_NAME, > > > > > > object, > > > > > > object->common.type, > > > > > > new_count)); > > > > > > > > > > > > - /* Actually delete the object on a reference count > > > > > > of > > > > > > zero */ > > > > > > + /* If we haven't already, actually delete the > > > > > > > > > object on a reference> > > > > > > > > > > count of zero */ > > > > > > > > > > > - if (new_count == 0) { > > > > > > + if (new_count == 0 && original_count != 0) { > > > > > > > > > > > > acpi_ut_delete_internal_obj(object); > > > > > > > > > > > > } > > > > > > message = "Decrement"; > > > > > > > > > > > > -- > > > > > > 2.17.1