Re: lockdeps

Sage Weil <sage@xxxxxxxxxxx> · Tue, 11 Dec 2012 19:29:04 -0800 (PST)

On Tue, 11 Dec 2012, Sam Lang wrote:
> 
> I've been puzzling over a failure in teuthology where lockdeps were enabled
> and reported a lock cycle.  The output of the found cycle is below.  I think
> the issue is actually erroneous, as it reports a found cycle, but the two
> dependencies that cause the cycle occur in separate threads.  Its correctly
> detecting a possible deadlock due to out of order locking (thread1: a -> b,
> thread2: b -> a), but in this case I don't think the deadlock is possible,
> because the two threads never run at the same time.
> 
> My proposed fixes are in wip-lockdep-fixes.  It resolves the issue of stomping
> on thread ids by using gettid() instead of pthread_self(), and ensures that
> the cycle happens within the same thread.  It also allows the g_lockdep field
> to be set to 3, which will detect (and only warn) on possible deadlock cases
> across threads.
> 
> -sam
> 
> ------------------------------------
> existing dependency Client::client_lock (10) -> SimpleMessenger::lock (4) at:
>  ceph version 0.55-217-g331c250 (331c25046ecd99ec10c5835e8e674ca819e6168a)
>  1: (Client::init()+0xbbd) [0x7f0303a9d83d]
>  2: (ceph_mount_info::mount(std::string const&)+0x191) [0x7f0303a772b1]
>  3: (ceph_mount()+0x76) [0x7f0303a75bc6]
>  4: (LibCephFS_Open_empty_component_Test::TestBody()+0x4e1) [0x432cc1]
>  5: (testing::Test::Run()+0xaa) [0x46089a]
>  6: (testing::internal::TestInfoImpl::Run()+0x100) [0x4609a0]
>  7: (testing::TestCase::Run()+0xbd) [0x460a6d]
>  8: (testing::internal::UnitTestImpl::RunAllTests()+0x217) [0x460cd7]
>  9: (main()+0x35) [0x41c4d5]
>  10: (__libc_start_main()+0xed) [0x7f030309176d]
>  11: test_libcephfs() [0x41c531]
> 
> 2012-12-10 19:31:30.231305 7f02c8ff9700  0 new dependency
> SimpleMessenger::lock (4) -> Client::client_lock (10) creates a cycle at
>  ceph version 0.55-217-g331c250 (331c25046ecd99ec10c5835e8e674ca819e6168a)
>  1: (ObjectCacher::FlusherThread::entry()+0x15) [0x7f0303d98005]
>  2: (Thread::_entry_func(void*)+0x12) [0x7f0303c908d2]

I'm not seeing where this is actually happening... where is the msgr lock 
acquired in this case?  Could it be that it was locked from a previous 
iteration or something?

At first glance it sounds like a real bug...

sage

>  3: (()+0x7e9a) [0x7f0304a32e9a]
>  4: (clone()+0x6d) [0x7f03031624bd]
> 
> 2012-12-10 19:31:30.231332 7f02c8ff9700  0 btw, i am holding these locks:
> 2012-12-10 19:31:30.231334 7f02c8ff9700  0   SimpleMessenger::lock (4)
> 2012-12-10 19:31:30.231335 7f02c8ff9700  0
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html