OSD write op out of order

gyfelectric <gyfelectric@xxxxxxxxx> · Mon, 27 Dec 2021 17:12:11 +0800

    Hi all, 

Recently, the problem of OSD disorder has often appeared in my environment(14.2.5) and my Fuse Client borken 
due to "FAILED assert(ob->last_commit_tid < tid)”. My application can’t work normally now. 

The time series that triggered this problem is like this:
note:
a. my datapool is: EC 4+2 
b. osd(osd.x) of pg_1 is down

Event Sequences:
t1: op_1(write) send to OSD and send 5 shards to 5 osds. only return 4 shards except primary osd because there is osd(osd.x) down.
t2: many other operations have occurred in this pg and record in pg_log
t3: op_n(write) send to OSD and send 5 shards to 5 osds. only return 4 shards except primary osd because there is osd(osd.x) down.
t4: the peer osd report osd.x timeout to monitor and osd.x is marked down 
t5: pg_1 start canceling and requeueing op_1, op_2 … op_n to osd op_wq
t6: pg_1 start peering and op_1 is trimmed from pg_log and dup map in this process
t7: pg_1 become active and start reprocessing the op_1, op_2 … op_n
t8: op_1 is not found in pg_log and dup map, so redo it. 
t9: op_n is found in pg_log or dup map and be considered completed, so return osd reply to client directly with tid_op_n
t10: op_1 complete and return to client with tid_op_1. client will break down due to "assert(ob->last_commit_tid < tid)”

I found some relative issues in https://tracker.ceph.com/issues/23827 which have some discussions about this problem.
But i didn’t find an effective method to avoid this problem. 

I think the current mechanism to prevent non-idempotent op from being repeated is flawed, may be we should redesign it.
How do you think about it? And if my idea is wrong, what should i do to avoid this problem?

Any response is very grateful, thank you!

                                gyfelectric

                                    gyfelectric@xxxxxxxxx

        签名由
        网易邮箱大师
        定制

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx