RE: [PATCH] ACPICA: Add mutex to avoid race condition of reference count of notify object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

There is an ACPICA bug tracking this issue:
http://bugs.acpica.org/show_bug.cgi?id=1115
We'll be back working on this after finishing other tasks.

Thanks and best regards
-Lv

> From: Hanjun Guo [mailto:guohanjun@xxxxxxxxxx]
> Sent: Wednesday, June 03, 2015 2:53 PM
> 
> Hi Lv,
> 
> Any updates for this issue?
> 
> Thanks
> Hanjun
> 
> On 2014/6/18 12:42, Zheng, Lv wrote:
> > Hi,
> >
> >> From: joeyli [mailto:jlee@xxxxxxxx]
> >> Sent: Tuesday, June 17, 2014 5:19 PM
> >>
> >> On Tue, Jun 17, 2014 at 01:03:28AM +0000, Zheng, Lv wrote:
> >>> Hi,
> >>>
> >>>> From: linux-acpi-owner@xxxxxxxxxxxxxxx [mailto:linux-acpi-owner@xxxxxxxxxxxxxxx] On Behalf Of joeyli
> >>>> Sent: Monday, June 16, 2014 5:14 PM
> >>>>
> >>>> Hi Lv Zheng,
> >>>>
> >>>> Thanks for your response.
> >>>>
> >>>> On Mon, Jun 16, 2014 at 05:10:29AM +0000, Zheng, Lv wrote:
> >>>>> Hi,
> >>>>>
> >>>>>> From: Lee, Chun-Yi [mailto:joeyli.kernel@xxxxxxxxx]
> >>>>>> Sent: Monday, June 16, 2014 10:12 AM
> >>>>>>
> >>>>>> This issue found on v3.0 kernel, unfortunately there was no chance
> >>>>>> to test latest kernel on issue mchine. This patch tested on v3.0 kernel
> >>>>>> then sent to linux-acpi for review and note, maybe latest kernel also need.
> >>>>>>
> >>>>>> The problem happened when acpi thermal driver evaluate _PSL, but acpi
> >>>>>> processor driver install notify handler at the same time. In the
> >>>>>> code path of evaluate _PSL, it updates reference count of processor
> >>>>>> and its notify objects. When the notify handler installation done
> >>>>>> after the reference count of processor updated, it causes the
> >>>>>> ref_count of processor doesn't sync with its notify object's
> >>>>>> ref_count value.
> >>>>>>
> >>>>>> Here is an debugging log when issue reproduced:
> >>>>>>
> >>>>>> [    3.481773] ACPI_TYPE_PROCESSOR set ACPI_DEVICE_NOTIFY, object_desc->common.reference_count: 3, notify_obj-
> >>>>>>> common.reference_count: 1
> >>>>>> [    3.481958] PROCESSOR device_hid: LNXCPU
> >>>>>> ...
> >>>>>> [    3.487427] ACPI_TYPE_PROCESSOR, action = 1
> >>>>>> [    3.487428] Update device_notify ref_count
> >>>>>> [    3.487429] REF_DECREMENT ACPI_TYPE_LOCAL_NOTIFY original_count: 0
> >>>>>> [    3.487431] ACPI Warning: Obj ffff8800b0f40b28, Reference Count is already zero, cannot decrement
> >>>>>> [    3.487433]  (20110413/utdelete-431)
> >>>>>> [    3.487434] REF_DECREMENT ACPI_TYPE_PROCESSOR original_count: 2
> >>>>>>
> >>>>>> Accroding log, found the reference_count of parent object
> >>>>>> (it's processor in this case) is 3, it doesn't match with notify_object's
> >>>>>> reference_count, value is 1. It triggered "Reference Count is already zero"
> >>>>>> warning, then happen object double free issue later.
> >>>>>>
> >>>>>> To avoid rece condition, this patch introded ACPI_MTX_NOTIFY_REF_COUNT
> >>>>>> mutex to keep the ref_count of notify object sync with its parent
> >>>>>> object. And, it also set the reference_count value of new notify object
> >>>>>> equals to its parent object's reference_count.
> >>>>>>
> >>>>>> Signed-off-by: Lee, Chun-Yi <jlee@xxxxxxxx>
> >>>>>> ---
> >>>>>>  drivers/acpi/acpica/aclocal.h  | 3 ++-
> >>>>>>  drivers/acpi/acpica/evxface.c  | 7 ++++++-
> >>>>>>  drivers/acpi/acpica/utdelete.c | 6 ++++++
> >>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/acpi/acpica/aclocal.h b/drivers/acpi/acpica/aclocal.h
> >>>>>> index c7f743c..e25a4af 100644
> >>>>>> --- a/drivers/acpi/acpica/aclocal.h
> >>>>>> +++ b/drivers/acpi/acpica/aclocal.h
> >>>>>> @@ -85,8 +85,9 @@ union acpi_parse_object;
> >>>>>>  #define ACPI_MTX_MEMORY                 5	/* Debug memory tracking lists */
> >>>>>>  #define ACPI_MTX_DEBUG_CMD_COMPLETE     6	/* AML debugger */
> >>>>>>  #define ACPI_MTX_DEBUG_CMD_READY        7	/* AML debugger */
> >>>>>> +#define ACPI_MTX_NOTIFY_REF_COUNT	8	/* Reference count of notify object */
> >>>>>>
> >>>>>> -#define ACPI_MAX_MUTEX                  7
> >>>>>> +#define ACPI_MAX_MUTEX                  8
> >>>>>>  #define ACPI_NUM_MUTEX                  ACPI_MAX_MUTEX+1
> >>>>>>
> >>>>>>  /* Lock structure for reader/writer interfaces */
> >>>>>> diff --git a/drivers/acpi/acpica/evxface.c b/drivers/acpi/acpica/evxface.c
> >>>>>> index e114140..213fe1a 100644
> >>>>>> --- a/drivers/acpi/acpica/evxface.c
> >>>>>> +++ b/drivers/acpi/acpica/evxface.c
> >>>>>> @@ -495,6 +495,10 @@ acpi_install_notify_handler(acpi_handle device,
> >>>>>>  						handler, context,
> >>>>>>  						NULL);
> >>>>>>
> >>>>>> +		acpi_ut_acquire_mutex(ACPI_MTX_NOTIFY_REF_COUNT);
> >>>>>> +
> >>>>>> +		notify_obj->common.reference_count = obj_desc->common.reference_count;
> >>>>>> +
> >>>>> Should be converted to spin_lock here.
> >>>>>
> >>>> Did you mean using acpi_gbl_reference_count_lock ?
> >>> Yes.
> >>>
> >>>>>>  		if (handler_type & ACPI_SYSTEM_NOTIFY) {
> >>>>>>  			obj_desc->common_notify.system_notify = notify_obj;
> >>>>>>  		}
> >>>>>> @@ -503,8 +507,9 @@ acpi_install_notify_handler(acpi_handle device,
> >>>>>>  			obj_desc->common_notify.device_notify = notify_obj;
> >>>>>>  		}
> >>>>>>
> >>>>>> -		if (handler_type == ACPI_ALL_NOTIFY) {
> >>>>>> +		acpi_ut_release_mutex(ACPI_MTX_NOTIFY_REF_COUNT);
> >>>>>>
> >>>>>> +		if (handler_type == ACPI_ALL_NOTIFY) {
> >>>>>>  			/* Extra ref if installed in both */
> >>>>>>
> >>>>>>  			acpi_ut_add_reference(notify_obj);
> >>>>>> diff --git a/drivers/acpi/acpica/utdelete.c b/drivers/acpi/acpica/utdelete.c
> >>>>>> index 31f5a78..7559813 100644
> >>>>>> --- a/drivers/acpi/acpica/utdelete.c
> >>>>>> +++ b/drivers/acpi/acpica/utdelete.c
> >>>>>> @@ -504,6 +504,7 @@ acpi_ut_update_object_reference(union acpi_operand_object *object, u16 action)
> >>>>>>
> >>>>>>  			/* Update the notify objects for these types (if present) */
> >>>>>>
> >>>>>> +			acpi_ut_acquire_mutex(ACPI_MTX_NOTIFY_REF_COUNT);
> >>>> Can I acquire acpi_gbl_reference_count_lock here? Does is not cause recursive lock?
> >>>> Or direct acquire remove mutex here?
> >>> You can.
> >>> If you just want to hold the lock longer, you can introduce a lockless API to replace the one invoked below.
> >>>
> >> I just tried to change my mutex code to acquire acpi_gbl_reference_count_lock in acpi_ut_update_object_reference().
> >> But when system boot, kernel pending on updating ref_count of parent object, in acpi_ut_update_ref_count() when
> >> acquire acpi_gbl_reference_count_lock.
> >>
> >> On the other hand,
> >> I didn't find lockless spinlock API in acpi subsystem, and even in acpica project git. Could you please help teach
> >> me where is the lockless function?
> >>
> > I agree this is not a suitable case for acpi_gbl_reference_count_lock usage.
> >
> > I just want to say, this is an open source project, if you see any case need to extend the locking period, you can do any modification
> you want.
> > For example, if you find such a case: copying the reference from its original owner to a copy of the owner and try to increase it.
> > You can introduce an API to ensure atomicity by holding the acpi_gbl_reference_count_lock for the 2 operations.
> >
> >>>>>>  			acpi_ut_update_ref_count(object->common_notify.
> >>>>>>  						 system_notify, action);
> >>>>> If you take a look at acpi_ut_update_ref_count, there is already a spin_lock held around the code to update the reference
> count.
> >>>>>
> >>>>> Thanks
> >>>>> -Lv
> >>>> hm.... I am not sure can using acpi_gbl_reference_count_lock.
> >>>>
> >>>> When updating parent object's ref_count, it should atomic with its notify objects's ref_count updating.
> >>>> The situation I want to avoid is:
> >>>>
> >>>>  a. acpi_ut_update_object_reference() update ref_count of system_notify/device_notify. (Thermal driver)
> >>>>     Assume the notify object didn't install to parent object yet. So, the ref_count doesn't increase.
> >>>>
> >>>>  b. acpi_install_notify_handler() install system_notify/device_notify. (Processor driver)
> >>>>     Processor driver install notify handler of parent object in this window. The ref_count of notify object is 1.
> >>>>
> >>>>  c. acpi_ut_update_object_reference()  call acpi_ut_update_ref_count() to update ref_count of parent object. (Thermal driver)
> >>>>     The ref_count of parent object update to 2.
> >>>>     		<=== this ref_count doesn't match with ref_count of notify object, it's 1.
> >>>>
> >>>> Due to a. and c. steps should atomic, so I add ACPI_MTX_NOTIFY_REF_COUNT to bind them together, when
> >>>> the object type is processor, device, power or thermal.
> >>> I can see that the only problem is the sub-object's reference count updates.
> >>> It shouldn't be increased each time the parent object's reference count is increased, but should be increased only once when it is
> >> linked to the parent object (referenced by the parent).
> >>> That seems to be the root cause.
> >> Yes, this is the main problem, either increase or decrease ref_count of parent object, the sub-objects(here is notify object) will
> >> also updating either increase or decrease.
> >>
> >> In the case of my issue machine, when ref_count of parent object is 2 but sub-object's ref_count is 1, later decrease parent
> object's
> >> ref_count to 0 will causes sub-object's ref_count is -1. <=== it triggers the double free issue then slab will emit oops.
> > Can I say that:
> > The issue is caused by the current ACPICA locking implementation.
> > For this case, we need 2 locks:
> > 1. The first lock need to be used each time we are going to update the members of an object.
> > 2. The second lock need to be used to mutual exclusively executing the code that can affect the object's state, ensure it is not
> messed up from different call context. This should be achieved by some artificial locking facility. In many cases, the call path is locked
> without holding any locks that can lead to an atomic context.
> >
> > Currently we may just doing wrong by utilizing the 2nd lock to meet the 1st requirement.
> >
> > For ACPICA, shall we first ask:
> > 1. do we have a per-object lock to meet the first requirement?
> > 2. do we have a global lock to protect the consistencies of the namespace?
> >
> > I think you should fix this by tuning the locking granularity.
> > The patch can do so unless there is no other mean can be used to fix this issue.
> >
> > Thanks and best regards
> > -Lv
> >
> >>> The current code might be a hackish result to work around some design defects.
> >>> So what you do in this patch might be dangerous, they can break ACPICA interpreter and you need to do it in the ACPICA source
> >> base.
> >>> There is a recursive unit test facility - ASLTS to help determine if your change doesn't introduce regressions.
> >>> I think you need to file a bug in ACPICA bugzilla and start a discussion there.
> >>>
> >>> Thanks and best regards
> >>> -Lv
> >>>
> >> Thanks for your information, I will try ASLTS and file bug about the atomic between main object's ref_count
> >> with sub-object's ref_count.
> >>
> >>>>>>  			acpi_ut_update_ref_count(object->common_notify.
> >>>>>> @@ -592,6 +593,11 @@ acpi_ut_update_object_reference(union acpi_operand_object *object, u16 action)
> >>>>>>  		 * main object to be deleted.
> >>>>>>  		 */
> >>>>>>  		acpi_ut_update_ref_count(object, action);
> >>>>>> +		if (object->common.type == ACPI_TYPE_PROCESSOR ||
> >>>>>> +		    object->common.type == ACPI_TYPE_DEVICE ||
> >>>>>> +		    object->common.type == ACPI_TYPE_POWER ||
> >>>>>> +		    object->common.type == ACPI_TYPE_THERMAL)
> >>>>>> +			acpi_ut_release_mutex(ACPI_MTX_NOTIFY_REF_COUNT);
> >>>> here also. I think need remove this release?
> >>>>
> >>>>>>  		object = NULL;
> >>>>>>
> >>>>>>  		/* Move on to the next object to be updated */
> >>>>>> --
> >>>>>> 1.8.4.5
> >>>> Please let me know if I miss understood.
> >>>>
> >>>>
> >>>> Thanks a lot!
> >>>> Joey Lee
> >> Regards
> >> Joey Lee
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > .
> >
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux