Getting rid of trim_object Snap .... not in clones

Andreas John <aj@xxxxxxxxxxx> · Sat, 1 Feb 2020 10:20:08 +0100

Hello,

for those sumbling upon a similar issue: I was able to mitigate the
issue, by setting

=== 8< ===

[osd.14]
osd_pg_max_concurrent_snap_trims = 0

=========

in ceph.conf. You don't need to restart the osd, osd crash crash +
systemd will do it for you :)

Now the osd in question does no trimming anymore and thus stays up.

Now I let the deep-scrubber run, and press thumbs it will clean up the
mess.

In case I need to clean up manually, could anyone give a hint how to
find the rbd with that snap? The logs says:

7faf8f716700 -1 log_channel(cluster) log [ERR] : trim_object Snap 29c44
not in clones

1.) What is the 7faf8f716700 at the beginning of the log? Is it a daemon
id?

2.) About the Snap "ID" 29c44: In the filesystem I see

...ceph-14/current/7.374_head/DIR_4/DIR_7/DIR_B/DIR_A/rbd\udata.59cb9c679e2a9e3.0000000000003096__29c44_A29AAB74__7

Do I read it correctly that in PG 7.374 there is with rbd prefix
59cb9c679e2a9e3 an object that ends with ..3096, which has a snap ID
29c44 ... ? What does the part A29AAB74__7 ?

I was nit able to find in docs how the directory / filename is structured.

Best Regrads,

j.

On 31.01.20 16:04, Andreas John wrote:
> Hello,
>
> in my cluster one after the other OSD dies until I recognized that it
> was simply an "abort" in the daemon caused probably by
>
> 2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log
> [ERR] : trim_object Snap 29c44 not in clones
>
>
> Close to this msg I get a stracktrace:
>
>
>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>  1: /usr/bin/ceph-osd() [0xb35f7d]
>  2: (()+0x11390) [0x7f0fec74b390]
>  3: (gsignal()+0x38) [0x7f0feab43428]
>  4: (abort()+0x16a) [0x7f0feab4502a]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f0feb48684d]
>  6: (()+0x8d6b6) [0x7f0feb4846b6]
>  7: (()+0x8d701) [0x7f0feb484701]
>  8: (()+0x8d919) [0x7f0feb484919]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x27e) [0xc3776e]
>  10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x10dd) [0x868cfd]
>  11: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x80)
> [0x8690e0]
>  12: (Context::complete(int)+0x9) [0x6c8799]
>  13: (void ReplicatedBackend::sub_op_modify_reply<MOSDRepOpReply,
> 113>(std::tr1::shared_ptr<OpRequest>)+0x21b) [0xa5ae0b]
>  14:
> (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x15b)
> [0xa53edb]
>  15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,
> ThreadPool::TPHandle&)+0x1cb) [0x84c78b]
>  16: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ef) [0x6966ff]
>  17: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x4e4) [0x696e14]
>  18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x71e)
> [0xc264fe]
>  19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc29950]
>  20: (()+0x76ba) [0x7f0fec7416ba]
>  21: (clone()+0x6d) [0x7f0feac1541d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
>
> Yes, I know it's still hammer, I want to upgrade soon, but I want to
> resolve that issue first. If I lose that PG, I don't worry.
>
> So: What it the best approach? Can I use something like
> ceph-objectstore-tool ... <object> remove-clone-metadata <cloneid> ? I
> assume 29c44 is my Object, but what's the clone od?
>
>
> Best regards,
>
> derjohn
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx