helgrind fincorrectly reports lock order violated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

The jewel rbd job

rbd/valgrind/{base/install.yaml clusters/{fixed-1.yaml openstack.yaml} fs/xfs.yaml validator/helgrind.yaml workloads/c_api_tests_with_journaling.yaml}

at

http://pulpito.ceph.com/loic-2016-08-17_22:11:27-rbd-jewel-backports-loic-distro-basic-smithi/371732/

for the sake of validating a series of backports found at https://github.com/ceph/ceph/pull/10684 failed with:

2016-08-18T03:38:43.412 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== Helgrind, a thread error detector
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== Command: ceph_test_librbd
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== ---Thread-Announcement------------------------------------------
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== Thread #26 was created
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    at 0xC92943E: clone (clone.S:74)
2016-08-18T03:38:43.413 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0xAC3F199: do_clone.constprop.3 (createthread.c:75)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0xAC408BA: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0xAA2BC90: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x715508: Thread::try_create(unsigned long) (Thread.cc:139)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x7155D7: Thread::create(char const*, unsigned long) (Thread.cc:154)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x6FF67E: Finisher::start() (Finisher.cc:15)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x669743: librados::RadosClient::connect() (RadosClient.cc:298)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x638FAF: rados_connect (librados.cc:2611)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x98DD6B: connect_cluster(void**) (test.cc:368)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x45A826: TestLibRBD::SetUpTestCase() (test_librbd.cc:159)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x948DA2: void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestCase, void>(testing::TestCase*, void (testing::TestCase::*)(), char const*) (gtest.cc:2078)
2016-08-18T03:38:43.414 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==
2016-08-18T03:38:43.415 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== ----------------------------------------------------------------
2016-08-18T03:38:43.415 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==
2016-08-18T03:38:43.415 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== Thread #26: lock order "0xD822EC8 before 0xD8FFB10" violated
2016-08-18T03:38:43.415 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==
2016-08-18T03:38:43.415 INFO:tasks.workunit.client.0.smithi036.stderr:==21351== Observed (incorrect) order is: acquisition of lock at 0xD8FFB10
2016-08-18T03:38:43.415 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    at 0xAA2D145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
2016-08-18T03:38:43.415 INFO:tasks.workunit.client.0.smithi036.stderr:==21351==    by 0x6E2497: Mutex::Lock(bool) (Mutex.cc:110)
...

And it looks like your commits:

https://github.com/ceph/ceph/commit/7d64f0ea62545d2ac4a4b399e9566d80b41d3068
https://github.com/ceph/ceph/commit/d4c76649902b74c37da4f7d448bcb0b7ffa04084
https://github.com/ceph/ceph/commit/b12cf70fe415cc872c33223bb6574ce47bea0623

are trying to address the problem. However, none of them were merged. A quick search in the tracker found

http://tracker.ceph.com/issues/15424 Many helgrind lock warnings within Objecter
http://tracker.ceph.com/issues/15352 TestJournalEntries.AioWrite: failed assert(m_watch_in_progress)

I'm tempted to add the failed job to the issue you created to collect helgrind issues: http://tracker.ceph.com/issues/14846 although none of the logs it contains seem to exactly match the failure. I could then set it to be backported to jewel and wait until https://github.com/ceph/ceph/pull/9436 is merged in master.

What do you think ?

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux