OSD write op out of order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



    
Hi all, 

Recently, the problem of OSD disorder has often appeared in my environment(14.2.5) and my Fuse Client borken 
due to "FAILED assert(ob->last_commit_tid < tid)”. My application can’t work normally now.

The time series that triggered this problem is like this:
note:
a. my datapool is: EC 4+2
b. osd(osd.x) of pg_1 is down

Event Sequences:
t1: op_1(write) send to OSD and send 5 shards to 5 osds. only return 4 shards except primary osd because there is osd(osd.x) down.
t2: many other operations have occurred in this pg and record in pg_log
t3: op_n(write) send to OSD and send 5 shards to 5 osds. only return 4 shards except primary osd because there is osd(osd.x) down.
t4: the peer osd report osd.x timeout to monitor and osd.x is marked down
t5: pg_1 start canceling and requeueing op_1, op_2 … op_n to osd op_wq
t6: pg_1 start peering and op_1 is trimmed from pg_log and dup map in this process
t7: pg_1 become active and start reprocessing the op_1, op_2 … op_n
t8: op_1 is not found in pg_log and dup map, so redo it.
t9: op_n is found in pg_log or dup map and be considered completed, so return osd reply to client directly with tid_op_n
t10: op_1 complete and return to client with tid_op_1. client will break down due to "assert(ob->last_commit_tid < tid)”

I found some relative issues in https://tracker.ceph.com/issues/23827 which have some discussions about this problem.
But i didn’t find an effective method to avoid this problem. 

I think the current mechanism to prevent non-idempotent op from being repeated is flawed, may be we should redesign it.
How do you think about it? And if my idea is wrong, what should i do to avoid this problem?

Any response is very grateful, thank you!


_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux