Re: boost and valgrind

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/11/22 11:56, Casey Bodley wrote:
On Tue, Jul 19, 2022 at 1:49 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
the rgw teuthology suite started seeing lots of valgrind issues a
couple weeks ago. we're tracking them in
https://tracker.ceph.com/issues/56500

as i understand it, valgrind is complaining about stack memory access
outside of the current thread's stack:

<auxwhat>Address 0x57f47f60 is on thread 135's stack</auxwhat>
rgw is using coroutine stacks allocated by boost::context, which
explains why valgrind is confused. boost::context supports valgrind's
annotations for these stacks (VALGRIND_STACK_REGISTER), but they
aren't enabled by default

in March 2020 with https://github.com/ceph/ceph/pull/34043, Adam added
a cmake option WITH_BOOST_VALGRIND that enables this 'valgrind' option
for ceph's bundled boost build. in
https://github.com/ceph/ceph-build/pull/1736, we enabled this for the
'notcmalloc' builds that we ran our valgrind tests against

however, we stopped doing 'notcmalloc' builds entirely after
https://github.com/ceph/teuthology/pull/1618 added the valgrind
options necessary to run against the normal tcmalloc builds. so we
lost this fix, but the rgw suite had been getting clean valgrind
results until just recently

i've confirmed that the issues do go away with WITH_BOOST_VALGRIND
enabled, but i really don't want to require a special build flavor for
it

does anyone know what changed here? are valgrind issues popping up
anywhere else?
this topic was discussed again yesterday in the ceph leadership call.
we've been trying to decide whether to enable WITH_BOOST_VALGRIND
globally, or to add it back as a new build flavor just for teuthology
runs

in https://tracker.ceph.com/issues/56500, Mark Kogan showed that
WITH_BOOST_VALGRIND has little to no effect on rgw performance. Mark
Nelson did similar testing with rbd trying to rule out a performance
hit there as well - the results had quite a bit of variation, so it
was hard to rule out completely

to narrow the scope of this investigation, i did a grep of the entire
boost project and confirmed that boost::context and boost::coroutine
are the only two libraries that mention this BOOST_USE_VALGRIND flag.
so i think it's safe to assume that these valgrind builds will have no
effect outside of rgw

i've raised https://github.com/ceph/ceph-build/pull/2043 to enable
WITH_BOOST_VALGRIND on all shaman builds, and will ask the component
leads for approval there


Great job Casey!  100% Agree on this.  The results I saw showed a weak correlation at best imho, and with your additional investigation I don't see any reason not to merge.  It will be really nice to continue to have valgrind support in default builds.


Mark



_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx


_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux