On 8/11/22 11:56, Casey Bodley wrote:
On Tue, Jul 19, 2022 at 1:49 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
the rgw teuthology suite started seeing lots of valgrind issues a
couple weeks ago. we're tracking them in
https://tracker.ceph.com/issues/56500
as i understand it, valgrind is complaining about stack memory access
outside of the current thread's stack:
<auxwhat>Address 0x57f47f60 is on thread 135's stack</auxwhat>
rgw is using coroutine stacks allocated by boost::context, which
explains why valgrind is confused. boost::context supports valgrind's
annotations for these stacks (VALGRIND_STACK_REGISTER), but they
aren't enabled by default
in March 2020 with https://github.com/ceph/ceph/pull/34043, Adam added
a cmake option WITH_BOOST_VALGRIND that enables this 'valgrind' option
for ceph's bundled boost build. in
https://github.com/ceph/ceph-build/pull/1736, we enabled this for the
'notcmalloc' builds that we ran our valgrind tests against
however, we stopped doing 'notcmalloc' builds entirely after
https://github.com/ceph/teuthology/pull/1618 added the valgrind
options necessary to run against the normal tcmalloc builds. so we
lost this fix, but the rgw suite had been getting clean valgrind
results until just recently
i've confirmed that the issues do go away with WITH_BOOST_VALGRIND
enabled, but i really don't want to require a special build flavor for
it
does anyone know what changed here? are valgrind issues popping up
anywhere else?
this topic was discussed again yesterday in the ceph leadership call.
we've been trying to decide whether to enable WITH_BOOST_VALGRIND
globally, or to add it back as a new build flavor just for teuthology
runs
in https://tracker.ceph.com/issues/56500, Mark Kogan showed that
WITH_BOOST_VALGRIND has little to no effect on rgw performance. Mark
Nelson did similar testing with rbd trying to rule out a performance
hit there as well - the results had quite a bit of variation, so it
was hard to rule out completely
to narrow the scope of this investigation, i did a grep of the entire
boost project and confirmed that boost::context and boost::coroutine
are the only two libraries that mention this BOOST_USE_VALGRIND flag.
so i think it's safe to assume that these valgrind builds will have no
effect outside of rgw
i've raised https://github.com/ceph/ceph-build/pull/2043 to enable
WITH_BOOST_VALGRIND on all shaman builds, and will ask the component
leads for approval there
Great job Casey! 100% Agree on this. The results I saw showed a weak
correlation at best imho, and with your additional investigation I don't
see any reason not to merge. It will be really nice to continue to have
valgrind support in default builds.
Mark
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx