On Monday 10 August 2009 20:50:47 Pallipadi, Venkatesh wrote: > On Mon, 2009-08-10 at 08:48 -0700, Thomas Renninger wrote: > > On Saturday 08 August 2009 15:55:12 Dominik Brodowski wrote: > > > Hey, > > > > > > how does this new patch square with > > > > > > commit 9e76988e9390a4ff4d171f690586d0c58186b47e > > > Author: Venki Pallipadi <venkatesh.pallipadi@xxxxxxxxx> > > > Date: Fri Oct 26 10:18:21 2007 -0700 > > > > > > [CPUFREQ] Eliminate cpufreq_userspace scaling_setspeed deadlock > > > > > Oops, I thought the scaling_setspeed always was in cpufreq.c and > > my patch was intended as cleanup. I did not realize that it > > explicitly > > was moved there to workaround a deadlock and my patch is more or > > less a revert of that workaround. > > Yes. Looking at the code, this deadlock should happen again with the > new code. > > > On the other hand side I wonder why this should not work (having > > show/store_scaling_setspeed in cpufreq_userspace.c) as > > ondemand/conservative do the same. > > Hmm, but it seems to work with recent changes. I tried to run into > > this with two test apps run in parallel (see at the end) and could > > not run into a deadlock. > > > > I can't think of any recent changes that will help with this particular > deadlock. ondemand/conservative does not have this problem as they add > their own sysfs directory and not a file under cpufreq directory and > they do not go through cpufreq rwsem. > > The problem with setspeed was, it goes through show/store in cpufreq.c > which will try to hold cpufreq rwsem. And userspace stop flow also > holds rwsem and cannot remove sysfs setspeed file on STOP path, as > there is a blocking read/write of setspeed. > > IIRC, tests were similar to what you have below. We had two tasks and > we typically hit the deadlock, with cpufreq debug messages turned on, > in about 30 mins or so. I gave them a try with recent cpufreq next tree and my "machine wide governor" patch reverted as this one would make things run through possible other paths. I rather quickly run into: WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcf/0xe6() Hardware name: To Be Filled By O.E.M. sysfs: cannot create duplicate filename '/devices/system/cpu/cpu0/cpufreq/scaling_setspeed' Modules linked in: cpufreq_ondemand powernow_k8 freq_table thermal processor Pid: 30045, comm: governor_change Tainted: G AW 2.6.31-rc4 #16 Call Trace: [<ffffffff811200e0>] ? sysfs_add_one+0xcf/0xe6 [<ffffffff810401c3>] warn_slowpath_common+0x77/0xa4 [<ffffffff8104023d>] warn_slowpath_fmt+0x3c/0x3e [<ffffffff811200e0>] sysfs_add_one+0xcf/0xe6 [<ffffffff8111fa79>] sysfs_add_file_mode+0x57/0x8b [<ffffffff8111fab9>] sysfs_add_file+0xc/0xe [<ffffffff8111fb67>] sysfs_create_file+0x25/0x27 [<ffffffff8138346e>] cpufreq_governor_userspace+0x66/0x2fe [<ffffffff813818f8>] __cpufreq_governor+0xa2/0xe0 [<ffffffff8138259d>] __cpufreq_set_policy+0x1a9/0x225 [<ffffffff813829ec>] store_scaling_governor+0x1f8/0x232 After reverting c294f894e4af611213257386e2f46f82c4554c1b (I commented these two lines out): // unlock_policy_rwsem_write(data->cpu); __cpufreq_governor(data, CPUFREQ_GOV_STOP); // lock_policy_rwsem_write(data->cpu); the test run successfully for quite some time. That means commit c294f894e4af611213257386e2f46f82c4554c1b modifies cpufreq core in a way that .governor(START) can be called twice without .governor(STOP) being called in between on the same cpu -> bad. Venki: Haven't you had patches to do this in a cleaner way so that .governor(STOP) is also called with rwsem held? It took really long, but I could run into a deadlock when removing setspeed sysfs file which is probably what you meant. Backtrace of the two tasks is attached. So the cleanup: [CPUFREQ] Cleanup governor struct, remove userspace specific set_speed functions git commit 5b2d1ede0acf0b8c3d26df969c06cd238b903562 has to be reverted again... Thanks Dominik for catching this! Thomas > # task 1 > while : > do > cat /sys/devices/system/cpu/cpu0/cpufreq/* > done > > # task 2 > while : > do > echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > done > > Thanks, > Venki > > > > I still like to do another test with it if before this gets > > merged into a mainline kernel. > > The global ondemand/conservative sysfs changes should be fine. > > > > > > > > Test apps run in parallel without any deadlock occuring: > > > > cat_cpufreq.sh > > ============== > > #!/bin/bash > > > > for((x=0;x<100000;x++));do > > cat /sys/devices/system/cpu/cpu*/cpufreq/* > > cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed > > usleep 100 > > done > > > > governor_change.sh > > ================== > > #!/bin/bash > > > > set -x > > for((x=0;x<100000;x++));do > > for y in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;do > > echo userspace >$y > > done > > usleep 100 > > for y in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;do > > echo performance >$y > > done > > usleep 100 > > done > >
[77908.519972] task PC stack pid father [77908.519972] governor_chan D 0000000000000002 5640 5889 5790 0x00000004 [77908.519972] ffff8801f90a3978 0000000000000046 ffff8801f90a38d8 ffffffff81065f41 [77908.519972] 00000000001d2800 000000000000ca28 ffff8801fd99cdc0 ffffffff81788020 [77908.519972] ffff8801fd99d150 000000008103a4f7 ffffffff8103a4b3 00000000001d2800 [77908.519972] Call Trace: [77908.519972] [<ffffffff81065f41>] ? trace_hardirqs_on+0xd/0xf [77908.519972] [<ffffffff8103a4b3>] ? finish_task_switch+0x3b/0xd6 [77908.519972] [<ffffffff81524e32>] ? thread_return+0x3e/0xc3 [77908.519972] [<ffffffff81033348>] ? cpuacct_charge+0x1c/0x96 [77908.519972] [<ffffffff815251e5>] schedule_timeout+0x22/0xcc [77908.519972] [<ffffffff81524fff>] ? wait_for_common+0x2b/0x136 [77908.519972] [<ffffffff81065f41>] ? trace_hardirqs_on+0xd/0xf [77908.519972] [<ffffffff81526954>] ? _spin_unlock_irq+0x2b/0x30 [77908.519972] [<ffffffff815250a0>] wait_for_common+0xcc/0x136 [77908.519972] [<ffffffff8103bad6>] ? default_wake_function+0x0/0xf [77908.519972] [<ffffffff811e4292>] ? __spin_lock_init+0x31/0x53 [77908.519972] [<ffffffff81525194>] wait_for_completion+0x18/0x1a [77908.519972] [<ffffffff8112ec7a>] sysfs_addrm_finish+0x21c/0x27c [77908.519972] [<ffffffff8112e855>] ? sysfs_addrm_start+0x78/0xa5 [77908.519972] [<ffffffff8112e855>] ? sysfs_addrm_start+0x78/0xa5 [77908.519972] [<ffffffff8112d152>] sysfs_hash_and_remove+0x4e/0x65 [77908.519972] [<ffffffff8112de88>] sysfs_remove_file+0x10/0x12 [77908.519972] [<ffffffff8139a503>] cpufreq_governor_userspace+0x17b/0x30c [77908.519972] [<ffffffff8139882f>] __cpufreq_governor+0xa2/0xe0 [77908.519972] [<ffffffff81398b9a>] __cpufreq_set_policy+0x17f/0x211 [77908.519972] [<ffffffff81399207>] store_scaling_governor+0x1fc/0x23c [77908.519972] [<ffffffff81399b96>] ? handle_update+0x0/0x33 [77908.519972] [<ffffffff81399c8f>] store+0x62/0x87 [77908.519972] [<ffffffff8112da9d>] sysfs_write_file+0xe4/0x119 [77908.519972] [<ffffffff810daa7e>] vfs_write+0xab/0x105 [77908.519972] [<ffffffff810dab9c>] sys_write+0x47/0x6f [77908.519972] [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b [77908.519972] cat D ffff8801f6de7f48 5896 23761 5891 0x00000004 [77908.519972] ffff8801f6de7dd8 0000000000000046 0000000000000000 0000000000000002 [77908.519972] 00000000001d2800 000000000000ca28 ffff8801f91a4dc0 ffff8801fd99cdc0 [77908.519972] ffff8801f91a5150 00000000815264a5 ffffc900001cfb90 0000000000000046 [77908.519972] Call Trace: [77908.519972] [<ffffffff81526520>] __down_read+0x97/0xb3 [77908.519972] [<ffffffff81525dde>] down_read+0x59/0x6d [77908.519972] [<ffffffff8139a1e2>] ? lock_policy_rwsem_read+0x48/0x78 [77908.519972] [<ffffffff8139a1e2>] lock_policy_rwsem_read+0x48/0x78 [77908.519972] [<ffffffff8139a29d>] show+0x3a/0x79 [77908.519972] [<ffffffff8112db8c>] sysfs_read_file+0xba/0x13c [77908.519972] [<ffffffff810dac6c>] vfs_read+0xa8/0x102 [77908.519972] [<ffffffff810dad8a>] sys_read+0x47/0x70 [77908.519972] [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b