On Mon, Dec 27, 2021 at 9:12 AM gyfelectric <gyfelectric@xxxxxxxxx> wrote:
Hi all,Recently, the problem of OSD disorder has often appeared in my environment(14.2.5) and my Fuse Client borkendue to "FAILED assert(ob->last_commit_tid < tid)”. My application can’t work normally now.The time series that triggered this problem is like this:note:a. my datapool is: EC 4+2b. osd(osd.x) of pg_1 is downEvent Sequences:t1: op_1(write) send to OSD and send 5 shards to 5 osds. only return 4 shards except primary osd because there is osd(osd.x) down.t2: many other operations have occurred in this pg and record in pg_logt3: op_n(write) send to OSD and send 5 shards to 5 osds. only return 4 shards except primary osd because there is osd(osd.x) down.t4: the peer osd report osd.x timeout to monitor and osd.x is marked downt5: pg_1 start canceling and requeueing op_1, op_2 … op_n to osd op_wqt6: pg_1 start peering and op_1 is trimmed from pg_log and dup map in this process
Unless I’m misunderstanding, either you have more ops that haven’t been committed+acked than the length of the pg log dup tracking, or else there’s a bug here and it’s trimming farther than it should.
Can you clarify which case? Because if you’re sending more ops than the pg log length, this is an expected failure and not one that’s feasible to resolve. You just need to spend the money to have enough memory for longer logs and dup detection.
-Greg
t7: pg_1 become active and start reprocessing the op_1, op_2 … op_nt8: op_1 is not found in pg_log and dup map, so redo it.t9: op_n is found in pg_log or dup map and be considered completed, so return osd reply to client directly with tid_op_nt10: op_1 complete and return to client with tid_op_1. client will break down due to "assert(ob->last_commit_tid < tid)”I found some relative issues in https://tracker.ceph.com/issues/23827 which have some discussions about this problem.But i didn’t find an effective method to avoid this problem.I think the current mechanism to prevent non-idempotent op from being repeated is flawed, may be we should redesign it.How do you think about it? And if my idea is wrong, what should i do to avoid this problem?Any response is very grateful, thank you!
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx