Re: Ceph and SL5.5

Gregory Farnum <gregf@xxxxxxxxxxxxxxx> · Mon, 19 Jul 2010 08:55:24 -0700



> Thank you very much! I tried the unstable version on Saturday, which  looks a lot better, but I am still facing similar issues:
>
> - The first one is when writing files, I usually get the message from cfuse,
>
> 10.07.17_23:13:12.117289 42f16940 client4213.objecter  pg 0.90ea on [1,0] is laggy: 51723
> 10.07.17_23:13:12.117302 42f16940 client4213.objecter  pg 0.4194 on [1,0] is laggy: 51722
This means the OSD is being slow to acknowledge some writes. It can
indicate that an OSD has crashed, but here it's probably just because
your drive is struggling to keep up with two VMs and all the write
streams (2x OSD journals, MDS journal via OSDs, client writes via
OSDs). Annoying but not important.


> - In some cases, cfuse (I turned off daemonizing) crushes, again when writing on the filesystem, but not due to xlist:
>
> osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_ack(sobject_t, loff_t, uint64_t, tid_t)':
> osdc/ObjectCacher.cc:670: FAILED assert(ob->last_ack_tid < tid)
>  1: (ObjectCacher::bh_write_ack(sobject_t, long, unsigned long, unsigned long)+0x6a8) [0x6b2bc2]
>  2: (ObjectCacher::C_WriteAck::finish(int)+0x57) [0x6c955b]
>  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xbd1) [0x688f07]
>  4: (Client::ms_dispatch(Message*)+0xd8) [0x61f4ca]
>  5: (Messenger::ms_deliver_dispatch(Message*)+0x55) [0x5ff0b7]
>  6: (SimpleMessenger::dispatch_entry()+0x50f) [0x5ea2e7]
>  7: (SimpleMessenger::DispatchThread::entry()+0x29) [0x5e8a1f]
>  8: (Thread::_entry_func(void*)+0x20) [0x5f8c0a]
>  9: /lib64/libpthread.so.0 [0x3d7420673d]
>  10: (clone()+0x6d) [0x3d736d3d1d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_ack(sobject_t, loff_t, uint64_t, tid_t)':
> osdc/ObjectCacher.cc:670: FAILED assert(ob->last_ack_tid < tid)
>  1: (ObjectCacher::bh_write_ack(sobject_t, long, unsigned long, unsigned long)+0x6a8) [0x6b2bc2]
>  2: (ObjectCacher::C_WriteAck::finish(int)+0x57) [0x6c955b]
>  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xbd1) [0x688f07]
>  4: (Client::ms_dispatch(Message*)+0xd8) [0x61f4ca]
>  5: (Messenger::ms_deliver_dispatch(Message*)+0x55) [0x5ff0b7]
>  6: (SimpleMessenger::dispatch_entry()+0x50f) [0x5ea2e7]
>  7: (SimpleMessenger::DispatchThread::entry()+0x29) [0x5e8a1f]
>  8: (Thread::_entry_func(void*)+0x20) [0x5f8c0a]
>  9: /lib64/libpthread.so.0 [0x3d7420673d]
>  10: (clone()+0x6d) [0x3d736d3d1d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion*'
> Aborted
Hmm, that's not an error I'm familiar with. What sort of workload is
producing this, and can you
1) Give us a copy of your executable and the core file?
2) Try and reproduce it with debugging on for the client: cfuse mnt
--debug_client 20 --debug_objecter 20 --log-file="/dir/to/log"
and give us that log.


> - Finally, I don't get what's wrong with deleting. Especially when then client crashes, but even when using rm, I don't see the filesystem freeing any space :(
Well, that definitely shouldn't happen, are you getting error messages
anywhere? I think Sage may have dealt with a bug report on something
like this recently, but I don't recall...he'll have to weigh in
himself. :)
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html