Possible race at user mode helper in request_firmware

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I see possible race in _request_firmware_load function, on which I wanted to take your opinion.

When system is going to low power mode, device_cache_fw_images() is called from pm notifier which schedules async workers to cache devices firmware. If more than 2 async requests falls under user helper mode, then there is a use after free of "firmware" directory kobject. The race will hit, when one asynchronous worker calls device_del() while other asynchronous worker calls get_device_parent() in
_request_firmware_load().

After loading firmware, first worker calls device_del(), and "firmware" directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with gdp_mutex lock held. Meanwhile, second async worker calls device_add(), and get_device_parent() is still able to find the firmware directory kobject in ksets after getting gdp_mutex.
This kobject gets added as parent to the second device kobject.
By the time of accessing the parent kobject, its ref count was 0 and prints below warning stack followed by crash.

[21147.455038] ------------[ cut here ]------------
[21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47 kobject_get+0x50/0x68() [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O) qcrypto_module(O) [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 3.10.84-g0957845-00427-g56a05c2 #1
[21147.481284] Workqueue: events_unbound async_run_entry_fn
[21147.481296] Call trace:
[21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
[21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
[21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
[21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
[21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
[21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
[21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
[21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
[21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
[21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
[21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
[21147.481490] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
[21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
[21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
[21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
[21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
[21147.481563] ---[ end trace dabc98ea48b8ba59 ]---

[21147.486436] Unable to handle kernel paging request at virtual address 0x6b6b6b6b6b6b6bcb
[21147.493816] pgd = ffffffc0a982d000
[21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
[21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O) qcrypto_module(O) [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 3.10.84-g0957845-00427-g56a05c2 #1
[21147.524686] Workqueue: events_unbound async_run_entry_fn
[21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti: ffffffc007880000
[21147.537438] PC is at sysfs_create_dir+0x34/0xd4
[21147.541949] LR is at kobject_add_internal+0x120/0x24c
[21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>] pstate: 80000145 [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit = 0xffffffc007880058)
[21148.003673] Call trace:
[21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
[21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
[21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
[21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
[21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
[21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
[21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
[21148.044822] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
[21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
[21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
[21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
[21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
[21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
[21148.080312] ---[ end trace dabc98ea48b8ba5a ]---

Below is the callstack where the parent is freed from the slub_debug info.
0xFFFFFFC000608D74  \\vmlinux\base/core\class_dir_release+0x0C
0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C
0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
0xFFFFFFC00061C65C \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC

For similar type of issue, I see there is an existing fix: "sysfs: driver core: Fix glue dir race condition by gdp_mutex".
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5


Still I am able to reproduce the issue. I have verified this race on kernels: 3.10, 3.18 and 4.4.

I followed below procedure to reproduce the issue:
     1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
     2. Use test_firmware, modified the test for async calls
     3. Replace WARN_ON with BUG_ON in kref_get().
I ran below script from shell:
    count=0
    while [ 1 ]
    do
            echo 3 > /sys/devices/virtual/misc/test_firmware/trigger_ufw
            if [ $? -ne 0 ]; then
                    print "Exiting.. "
                    exit 1
            fi
            count=$(($count +1))
            echo "count $count"
     done

With below patch, I could reproduce the issue in the 2nd iteration.

-------------------------8<----------------------------------------------

To reproduce the race condition, use firmware test and
CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
releases. So that we can catch the bug easily.

Enabled below flags in kernel config file:

+CONFIG_DEBUG_KOBJECT_RELEASE=y
+CONFIG_TEST_FIRMWARE=y


diff --git a/include/linux/kref.h b/include/linux/kref.h
index 484604d..727fb24 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
      * condition when this kref is freeing by some other thread right
now.
      * In this case one should use kref_get_unless_zero()
      */
-    WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
+    BUG_ON(atomic_inc_return(&kref->refcount) < 2);
 }

 /**
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
old mode 100644
new mode 100755
index 86374c1..14c9598
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -18,6 +18,7 @@
 #include <linux/miscdevice.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/async.h>

+
+static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
+
+static void __async_ufw_work(void *name,
+                                       async_cookie_t cookie)
+{
+    const struct firmware *fw;
+    const char *fw_name = name;
+    int ret;
+
+    pr_err("requesting %s\n", fw_name);
+
+    ret = request_firmware(&fw, fw_name, NULL);
+    pr_err("loaded: %zu\n", fw ? fw->size : 0);
+        if (!ret)
+                kfree(fw);
+
+}
+
+static char *name[] = { "fw1.bin", "fw2.bin"};
+
+static ssize_t trigger_ufw_store(struct device *dev,
+        struct device_attribute *attr,
+        const char *buf, size_t count)
+{
+    int rc, i, c=0;
+    u32 start;
+    char *fw_name;
+
+    rc = kstrtou32(buf, 0, &start);
+    if (rc){
+        pr_err("Invalid option\n");
+        return rc;
+    }
+    pr_err(" no of iterations %d\n", start);
+
+    for (i=0; i< start; i++)
+    {
+        if (c >= 2)
+            c = 0;
+        fw_name = name[c++];
+ async_schedule_domain(__async_ufw_work, (void *)fw_name, &ufw_domain);
+    }
+    async_synchronize_full_domain(&ufw_domain);
+    return count;
+}

 static DEVICE_ATTR_WO(trigger_request);
+static DEVICE_ATTR_WO(trigger_ufw);

 static int __init test_firmware_init(void)
 {
@@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
         goto dereg;
     }

+    rc = device_create_file(test_fw_misc_device.this_device,
+                             &dev_attr_trigger_ufw);
+    if (rc) {
+        pr_err("could not create sysfs interface: %d\n", rc);
+        goto dereg;
+    }
     pr_warn("interface ready\n");

     return 0;
@@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
 {
     release_firmware(test_firmware);
     device_remove_file(test_fw_misc_device.this_device,
-               &dev_attr_trigger_request);
+            &dev_attr_trigger_request);
+    device_remove_file(test_fw_misc_device.this_device,
+            &dev_attr_trigger_ufw);
     misc_deregister(&test_fw_misc_device);
     pr_warn("removed interface\n");
 }

----------------------------8<--------------------------------------------

Thanks and regards,
Chandrasekhar L.
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
 a Linux Foundation Collaborative Project.
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux