Re: Replication lag in block storage

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 13 Mar 2014 18:25:02 -0700



Right. So which is the interval that's taking all the time? Probably
it's waiting for the journal commit, but maybe there's something else
blocking progress. If it is the journal commit, check out how busy the
disk is (is it just saturated?) and what its normal performance
characteristics are (is it breaking?).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Mar 13, 2014 at 5:48 PM, Greg Poirier <greg.poirier@xxxxxxxxxx> wrote:
> Many of the sub ops look like this, with significant lag between received_at
> and commit_sent:
>
>         { "description": "osd_op(client.6869831.0:1192491
> rbd_data.67b14a2ae8944a.0000000000009105 [write 507904~3686400] 6.556a4db0
> e660)",
>           "received_at": "2014-03-13 20:42:05.811936",
>           "age": "46.088198",
>           "duration": "0.038328",
> <snip>
>                     { "time": "2014-03-13 20:42:05.850215",
>                       "event": "commit_sent"},
>                     { "time": "2014-03-13 20:42:05.850264",
>                       "event": "done"}]]},
>
> In this case almost 39ms between received_at and commit_sent.
>
> A particularly egregious example of 80+ms lag between received_at and
> commit_sent:
>
>        { "description": "osd_op(client.6869831.0:1190526
> rbd_data.67b14a2ae8944a.0000000000008fac [write 3325952~868352] 6.5255f5fd
> e660)",
>           "received_at": "2014-03-13 20:41:40.227813",
>           "age": "320.017087",
>           "duration": "0.086852",
> <snip>
>                     { "time": "2014-03-13 20:41:40.314633",
>                       "event": "commit_sent"},
>                     { "time": "2014-03-13 20:41:40.314665",
>                       "event": "done"}]]},
>
>
>
> On Thu, Mar 13, 2014 at 4:17 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Thu, Mar 13, 2014 at 3:56 PM, Greg Poirier <greg.poirier@xxxxxxxxxx>
>> wrote:
>> > We've been seeing this issue on all of our dumpling clusters, and I'm
>> > wondering what might be the cause of it.
>> >
>> > In dump_historic_ops, the time between op_applied and sub_op_commit_rec
>> > or
>> > the time between commit_sent and sub_op_applied is extremely high. Some
>> > of
>> > the osd_sub_ops are as long as 100 ms. A sample dump_historic_ops is
>> > included at the bottom.
>>
>> It's important to understand what each of those timestamps are reporting.
>>
>> op_applied: the point at which an OSD has applied an operation to its
>> readable backing filesystem in-memory (which for xfs or ext4 will be
>> after it's committed to the journal)
>> sub_op_commit_rec: the point at which an OSD has gotten commits from
>> the replica OSDs
>> commit_sent: the point at which a replica OSD has sent a commit back
>> to its primary
>> sub_op_applied: the point at which a replica OSD has applied a
>> particular operation to its backing filesystem in-memory (again, after
>> the journal if using xfs)
>>
>> Reads are never served from replicas, so a long time between
>> commit_sent and sub_op_applied should not in itself be an issue. A lag
>> time between op_applied and sub_op_commit_rec means that the OSD is
>> waiting on its replicas. A long time there indicates either that the
>> replica is processing slowly, or that there's some issue in the
>> communications stack (all the way from the raw ethernet up to the
>> message handling in the OSD itself).
>> So the first thing to look for are sub ops which have a lag time
>> between the received_at and commit_sent timestamps. If none of those
>> ever turn up, but unusually long waits for sub_op_commit_rec are still
>> present, then it'll take more effort to correlate particular subops on
>> replicas with the op on the primary they correspond to, and see where
>> the time lag is coming into it.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com