RE: debugging librbd async

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > I'm also testing valgrind at the moment, just basic memtest, but suddenly
> > everything is quite stable even though it's under reasonable load right now.
> > Stupid heisenbugs.
> 
> Valgrind makes things go very slow (~10x?), which can have a huge effect
> on timing. Sometimes that reveals new races, other times it hides others.
> :/
> 

That's what I figured. It's run overnight with no hiccups so far under valgrind (helgrind - but no crashes under memtest either). The helgrind process on one tapdisk instance is running pretty hard, but performance is tolerable. I'll leave it go and poke at it occasionally to see if I can reproduce an actual crash.

So far helgrind has identified a bunch of possible races like:

==12478== Possible data race during write of size 4 at 0x913F250 by thread #1
==12478== Locks held: 2, at addresses 0x9119B80 0x913EF88
==12478==    at 0x5821D57: Pipe::Pipe(SimpleMessenger*, int, Connection*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x5838FC7: SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*, Message*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x5839414: SimpleMessenger::get_connection(entity_inst_t const&) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56D062F: Objecter::get_session(int) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56D0FAA: Objecter::recalc_op_target(Objecter::Op*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56D6140: Objecter::_op_submit(Objecter::Op*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56C4640: librados::IoCtxImpl::operate_read(object_t const&, ObjectOperation*, ceph::buffer::list*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56C53D1: librados::IoCtxImpl::stat(object_t const&, unsigned long*, long*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x569B83E: librados::IoCtx::stat(std::string const&, unsigned long*, long*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x63E45BF: librbd::detect_format(librados::IoCtx&, std::string const&, bool*, unsigned long*) (in /usr/lib/librbd.so.1.0.0)
==12478==    by 0x63DCF6E: librbd::ImageCtx::init() (in /usr/lib/librbd.so.1.0.0)
==12478==    by 0x63F45C7: librbd::open_image(librbd::ImageCtx*) (in /usr/lib/librbd.so.1.0.0)
==12478==
==12478== This conflicts with a previous read of size 4 by thread #9
==12478== Locks held: none
==12478==    at 0x581E6A1: Pipe::tcp_read_wait() (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x581E96F: Pipe::tcp_read(char*, int) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x581F965: Pipe::read_message(Message**) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x583138B: Pipe::reader() (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x5834D5C: Pipe::Reader::entry() (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x4C2B5AD: mythread_wrapper (hg_intercepts.c:219)
==12478==    by 0x6650B4F: start_thread (pthread_create.c:304)
==12478==    by 0x6940A7C: clone (clone.S:112)

But I don't think they are correct. There are also some races that appear to relate to memcpy.

Also I believe tapdisk serialises all callbacks and I don't know that helgrind is aware of that completely as it identifies some cases that just can't happen.

The last thing logged is:

==12478== More than 10000000 total errors detected.  I'm not reporting any more.
==12478== Final error counts will be inaccurate.  Go fix your program!
==12478== Rerun with --error-limit=no to disable this cutoff.  Note
==12478== that errors may occur in your program without prior warning from
==12478== Valgrind, because errors are no longer being displayed.

So I think maybe it's stopped being useful in that instance. I can post the full output if you want. It's only about 40kb

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux