On Tue, Jul 19, 2022 at 1:49 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote: > > the rgw teuthology suite started seeing lots of valgrind issues a > couple weeks ago. we're tracking them in > https://tracker.ceph.com/issues/56500 > > as i understand it, valgrind is complaining about stack memory access > outside of the current thread's stack: > > > <auxwhat>Address 0x57f47f60 is on thread 135's stack</auxwhat> > > rgw is using coroutine stacks allocated by boost::context, which > explains why valgrind is confused. boost::context supports valgrind's > annotations for these stacks (VALGRIND_STACK_REGISTER), but they > aren't enabled by default > > in March 2020 with https://github.com/ceph/ceph/pull/34043, Adam added > a cmake option WITH_BOOST_VALGRIND that enables this 'valgrind' option > for ceph's bundled boost build. in > https://github.com/ceph/ceph-build/pull/1736, we enabled this for the > 'notcmalloc' builds that we ran our valgrind tests against > > however, we stopped doing 'notcmalloc' builds entirely after > https://github.com/ceph/teuthology/pull/1618 added the valgrind > options necessary to run against the normal tcmalloc builds. so we > lost this fix, but the rgw suite had been getting clean valgrind > results until just recently > > i've confirmed that the issues do go away with WITH_BOOST_VALGRIND > enabled, but i really don't want to require a special build flavor for > it > > does anyone know what changed here? are valgrind issues popping up > anywhere else? this topic was discussed again yesterday in the ceph leadership call. we've been trying to decide whether to enable WITH_BOOST_VALGRIND globally, or to add it back as a new build flavor just for teuthology runs in https://tracker.ceph.com/issues/56500, Mark Kogan showed that WITH_BOOST_VALGRIND has little to no effect on rgw performance. Mark Nelson did similar testing with rbd trying to rule out a performance hit there as well - the results had quite a bit of variation, so it was hard to rule out completely to narrow the scope of this investigation, i did a grep of the entire boost project and confirmed that boost::context and boost::coroutine are the only two libraries that mention this BOOST_USE_VALGRIND flag. so i think it's safe to assume that these valgrind builds will have no effect outside of rgw i've raised https://github.com/ceph/ceph-build/pull/2043 to enable WITH_BOOST_VALGRIND on all shaman builds, and will ask the component leads for approval there _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx