Re: 【cephfs】cephfs hung when scp/rsync large files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I tested the dirty_ratio and dirty_background_ratio setting on ceph storage server side.
It didn’t work.


The problem reproduced just now. 


Some process stay in D state for over 10 minitues.
These osd requests is about to take several  minutes.

Here are some logs:

[Thu Dec  6 17:59:18 2018] INFO: task java:11477 blocked for more than 120 seconds.
[Thu Dec  6 17:59:18 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Dec  6 17:59:18 2018] java            D ffffffff816a4180     0 11477   8391 0x00000180
[Thu Dec  6 17:59:18 2018]  ffff8807942bbbd0 0000000000000086 ffff8801f7e79fa0 ffff8807942bbfd8
[Thu Dec  6 17:59:18 2018]  ffff8807942bbfd8 ffff8807942bbfd8 ffff8801f7e79fa0 ffff88086de56cc0
[Thu Dec  6 17:59:18 2018]  0000000000000000 7fffffffffffffff ffff88047ffa04e8 ffffffff816a4180
[Thu Dec  6 17:59:18 2018] Call Trace:
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a6059>] schedule+0x29/0x70
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a3b69>] schedule_timeout+0x239/0x2c0
[Thu Dec  6 17:59:18 2018]  [<ffffffff81190541>] ? pagevec_lookup_tag+0x21/0x30
[Thu Dec  6 17:59:18 2018]  [<ffffffffc081f598>] ? ceph_writepages_start+0x1298/0x1520 [ceph]
[Thu Dec  6 17:59:18 2018]  [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a56dd>] io_schedule_timeout+0xad/0x130
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a5778>] io_schedule+0x18/0x20
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a4191>] bit_wait_io+0x11/0x50
[Thu Dec  6 17:59:18 2018]  [<ffffffff816a3cb5>] __wait_on_bit+0x65/0x90
[Thu Dec  6 17:59:18 2018]  [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
[Thu Dec  6 17:59:18 2018]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
[Thu Dec  6 17:59:18 2018]  [<ffffffff81181de1>] __filemap_fdatawait_range+0x111/0x190
[Thu Dec  6 17:59:18 2018]  [<ffffffff8118f42e>] ? do_writepages+0x1e/0x40
[Thu Dec  6 17:59:18 2018]  [<ffffffff81181e74>] filemap_fdatawait_range+0x14/0x30
[Thu Dec  6 17:59:18 2018]  [<ffffffff81183db6>] filemap_write_and_wait_range+0x56/0x90
[Thu Dec  6 17:59:18 2018]  [<ffffffffc082726c>] ceph_fsync+0x6c/0x510 [ceph]
[Thu Dec  6 17:59:18 2018]  [<ffffffff81137273>] ? __secure_computing+0x73/0x240
[Thu Dec  6 17:59:18 2018]  [<ffffffff8111f5c6>] ? __audit_syscall_exit+0x1e6/0x280
[Thu Dec  6 17:59:18 2018]  [<ffffffff81231d85>] do_fsync+0x65/0xa0
[Thu Dec  6 17:59:18 2018]  [<ffffffff81232083>] SyS_fdatasync+0x13/0x20
[Thu Dec  6 17:59:18 2018]  [<ffffffff816b1d4e>] tracesys+0xd9/0xde
[Thu Dec  6 18:03:38 2018] libceph: osd3 10.0.30.111:6808 socket closed (con state OPEN)
[Thu Dec  6 18:03:39 2018] libceph: osd7 10.0.30.112:6807 socket closed (con state OPEN)
[Thu Dec  6 18:03:39 2018] libceph: osd6 10.0.30.112:6805 socket closed (con state OPEN)


cephfs kernel client side osdc:

REQUESTS 1548 homeless 0
124188  osd5    1.7e6c3097      [5,2]/5 [5,2]/5 100004474e2.000006bf    0x400024        1       0'0     write
124191  osd5    1.37d4eeac      [5,2]/5 [5,2]/5 100004474e2.000006c2    0x400024        1       0'0     write
124192  osd5    1.2506d82d      [5,1]/5 [5,1]/5 100004474e2.000006c3    0x400024        1       0'0     write
124198  osd5    1.5326c49c      [5,3]/5 [5,3]/5 100004474e1.00000888    0x400024        1       0'0     write
124219  osd5    1.37203f43      [5,2]/5 [5,2]/5 100004474e2.000006cb    0x400024        1       0'0     write
124236  osd5    1.1318aeb5      [5,0]/5 [5,0]/5 100002adc43.00000000    0x400024        1       0'0     write
124268  osd5    1.b4368bad      [5,0]/5 [5,0]/5 100004474e1.000008a2    0x400024        1       0'0     write
124272  osd5    1.a31398ad      [5,0]/5 [5,0]/5 100004474e2.000006dc    0x400024        1       0'0     write
124279  osd5    1.f6363aff      [5,0]/5 [5,0]/5 100004474e2.000006e3    0x400024        1       0'0     write
124281  osd5    1.e3386fca      [5,2]/5 [5,2]/5 100004474e2.000006e5    0x400024        1       0'0     write
124288  osd5    1.a95ff0ac      [5,2]/5 [5,2]/5 100004474e1.000008ab    0x400024        1       0'0     write
124290  osd5    1.4e9c48ff      [5,0]/5 [5,0]/5 100004474e1.000008ad    0x400024        1       0'0     write
124293  osd5    1.e4721518      [5,2]/5 [5,2]/5 100004474e1.000008b0    0x400024        1       0'0     write
124299  osd5    1.fae2c501      [5,0]/5 [5,0]/5 100004474e2.000006ec    0x400024        1       0'0     write
124303  osd5    1.708289ad      [5,0]/5 [5,0]/5 100004474e2.000006f0    0x400024        1       0'0     write
124312  osd5    1.85e93586      [5,0]/5 [5,0]/5 100004474e1.000008b8    0x400024        1       0'0     write
124314  osd5    1.85146f18      [5,2]/5 [5,2]/5 100004474e1.000008ba    0x400024        1       0'0     write
124316  osd5    1.b4c5875a      [5,0]/5 [5,0]/5 100004474e2.000006f2    0x400024        1       0'0     write
124336  osd5    1.cc35e38e      [5,2]/5 [5,2]/5 100004474e1.000008c5    0x400024        1       0'0     write
124339  osd5    1.c99de197      [5,2]/5 [5,2]/5 100004474e2.000006fe    0x400024        1       0'0     write
124364  osd5    1.21ea6cb7      [5,2]/5 [5,2]/5 100004474e2.0000070c    0x400024        1       0'0     write
124368  osd5    1.411148d8      [5,1]/5 [5,1]/5 100004474e2.00000710    0x400024        1       0'0     write
124370  osd5    1.c56129da      [5,1]/5 [5,1]/5 100004474e2.00000712    0x400024        1       0'0     write
124380  osd5    1.dd7b1c12      [5,1]/5 [5,1]/5 100004474e1.000008db    0x400024        1       0'0     write
124384  osd5    1.1878e328      [5,2]/5 [5,2]/5 100004474e2.00000715    0x400024        1       0'0     write
124390  osd5    1.f992feda      [5,1]/5 [5,1]/5 100004474e2.0000071b    0x400024        1       0'0     write
124396  osd5    1.84f2e0ff      [5,0]/5 [5,0]/5 100004474e1.000008e0    0x400024        1       0'0     write
124398  osd5    1.76e9cbb5      [5,0]/5 [5,0]/5 100004474e1.000008e2    0x400024        1       0'0     write
124405  osd5    1.3a4c5001      [5,0]/5 [5,0]/5 100004474e2.0000071f    0x400024        1       0'0     write
124413  osd5    1.91f47ada      [5,1]/5 [5,1]/5 100004474e2.00000727    0x400024        1       0'0     write
124420  osd5    1.da08fa82      [5,1]/5 [5,1]/5 100004474e1.000008ed    0x400024        1       0'0     write
124422  osd5    1.f0dc5efd      [5,3]/5 [5,3]/5 100004474e1.000008ef    0x400024        1       0'0     write
124435  osd5    1.28ba590a      [5,2]/5 [5,2]/5 100004474e2.00000732    0x400024        1       0'0     write
124442  osd5    1.21439cf       [5,1]/5 [5,1]/5 100004474e1.000008f8    0x400024        1       0'0     write


ceph server side dumps:

# ceph daemon osd.5 dump_blocked_ops
{
    "ops": [],
    "complaint_time": 30.000000,
    "num_blocked_ops": 0
}
# ceph daemon osd.5 dump_ops_in_flight
{
    "ops": [],
    "num_ops": 0
}

# ceph daemon osd.5 dump_historic_slow_ops
{
    "num to keep": 20,
    "threshold to keep": 10,
    "Ops": [
        {
            "description": "osd_op(client.447197.1:71890 1.b7884c19 1:983211ed:::100004342f9.000015e6:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.573307",
            "age": 649.893033,
            "duration": 13.827240,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71890
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.573307",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.573307",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.573309",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.595225",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.595278",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.595291",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:22.970164",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:22.970227",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:22.970273",
                        "event": "waiting for subops from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323234",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.400524",
                        "event": "sub_op_commit_rec from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:24.400541",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.400547",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:71905 1.b5503364 1:26cc0aad:::100004342fb.00000c45:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.620624",
            "age": 649.845717,
            "duration": 12.702676,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71905
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.620624",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.620624",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.620626",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.635423",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.635476",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.635485",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.057810",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.057896",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.057945",
                        "event": "waiting for subops from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:23.237036",
                        "event": "sub_op_commit_rec from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323250",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323285",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323300",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:71913 1.9b59c75f 1:fae39ad9:::100004342fb.00000c4d:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.649445",
            "age": 649.816895,
            "duration": 12.762881,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71913
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.649445",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.649445",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.649447",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.667713",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.667783",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.667803",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.059423",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.059494",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.059540",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.147873",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412282",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412312",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412326",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:71924 1.43f2a896 1:69154fc2:::100004342f9.000015f9:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.680689",
            "age": 649.785651,
            "duration": 12.731666,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71924
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.680689",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.680689",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.680692",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.702583",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.702627",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.702634",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.147879",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.147979",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.148065",
                        "event": "waiting for subops from 4"
                    },
                    {
                        "time": "2018-12-06 18:04:23.237049",
                        "event": "sub_op_commit_rec from 4"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412333",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412349",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412356",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:71935 1.df579f5f 1:faf9eafb:::100004342fb.00000c54:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.751790",
            "age": 649.714550,
            "duration": 12.747687,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71935
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.751790",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.751790",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.751792",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.764998",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.765049",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.765059",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.149915",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.149990",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.150037",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.237059",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499434",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499466",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499477",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:71948 1.30315755 1:aaea8c0c:::100004342f9.00001602:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.765065",
            "age": 649.701276,
            "duration": 12.734734,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71948
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.765065",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.765065",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.765067",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.789073",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.789125",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.789136",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.237064",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.237142",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.237187",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323299",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499499",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499513",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499799",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:71952 1.e0451baf 1:f5d8a207:::100004342f9.00001606:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.789142",
            "age": 649.677198,
            "duration": 12.809745,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71952
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.789142",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.789142",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.789145",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.802180",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.802240",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.802249",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.238710",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.238776",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.238890",
                        "event": "waiting for subops from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323284",
                        "event": "sub_op_commit_rec from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598857",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598878",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598888",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:71987 1.87226b4b 1:d2d644e1:::100004342f9.0000161a:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.875449",
            "age": 649.590892,
            "duration": 12.723466,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 71987
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.875449",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.875449",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.875450",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.889074",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.889109",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.889113",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323306",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323402",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.323465",
                        "event": "waiting for subops from 4"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412387",
                        "event": "sub_op_commit_rec from 4"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598895",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598908",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598915",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72009 1.6e26a0cd 1:b3056476:::100004342f9.00001621:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.951671",
            "age": 649.514669,
            "duration": 13.559266,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72009
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.951671",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.951671",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.951673",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.971212",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.971265",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.971274",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.325006",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.325073",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.325120",
                        "event": "waiting for subops from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:23.689393",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.510912",
                        "event": "sub_op_commit_rec from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:24.510930",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.510937",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72027 1.2675011e 1:7880ae64:::100004342fb.00000c83:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.983790",
            "age": 649.482551,
            "duration": 13.631767,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72027
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.983790",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.983790",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.983791",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:10.997441",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.997486",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:10.997493",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412425",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412492",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.412539",
                        "event": "waiting for subops from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:23.782075",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.615518",
                        "event": "sub_op_commit_rec from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:24.615545",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.615556",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72047 1.20d53eaf 1:f57cab04:::100004342f9.00001638:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:10.997499",
            "age": 649.468842,
            "duration": 12.894479,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72047
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:10.997499",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:10.997499",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:10.997500",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.010879",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.010920",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.010925",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499523",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499599",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.499652",
                        "event": "waiting for subops from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598911",
                        "event": "sub_op_commit_rec from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:23.891906",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.891960",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.891977",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72114 1.5c7de55 1:aa7be3a0:::100004342fb.00000cad:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.111490",
            "age": 649.354851,
            "duration": 12.780517,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72114
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.111490",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.111490",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.111492",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.127195",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.127236",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.127242",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598920",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.598999",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.599048",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.688752",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.891986",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:23.892001",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:23.892007",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72124 1.59941c4b 1:d238299a:::100004342fb.00000cb7:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.127248",
            "age": 649.339093,
            "duration": 12.962263,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72124
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.127248",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.127248",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.127250",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.137801",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.137837",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.137842",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.600686",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.600750",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.600795",
                        "event": "waiting for subops from 4"
                    },
                    {
                        "time": "2018-12-06 18:04:23.688740",
                        "event": "sub_op_commit_rec from 4"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089450",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089493",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089511",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72136 1.4c3df65f 1:fa6fbc32:::100004342f9.00001664:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.161272",
            "age": 649.305069,
            "duration": 12.928269,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72136
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.161272",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.161272",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.161273",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.176621",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.176665",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.176672",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.688809",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.688878",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.688924",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:23.781472",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089520",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089534",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089541",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72202 1.de800dcd 1:b3b0017b:::100004342fb.00000cd8:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.235424",
            "age": 649.230917,
            "duration": 13.494299,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72202
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.235424",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.235424",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.235425",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.244364",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.244400",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.244406",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.690409",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.690473",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.690519",
                        "event": "waiting for subops from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:24.175400",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.729691",
                        "event": "sub_op_commit_rec from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:24.729714",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.729723",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72222 1.bc0cdac3 1:c35b303d:::100004342f9.00001694:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.283182",
            "age": 649.183159,
            "duration": 12.980151,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72222
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.283182",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.283182",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.283183",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.290043",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.290087",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.290095",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.781478",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.781545",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.781591",
                        "event": "waiting for subops from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:23.892055",
                        "event": "sub_op_commit_rec from 7"
                    },
                    {
                        "time": "2018-12-06 18:04:24.263304",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.263325",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.263333",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72237 1.567a0855 1:aa105e6a:::100004342f9.000016a3:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.305424",
            "age": 649.160916,
            "duration": 13.052176,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72237
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.305424",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.305424",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.305426",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.312298",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.312335",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.312340",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.892064",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:23.892156",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:23.892216",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089563",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:24.357583",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.357594",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.357600",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72249 1.960d9955 1:aa99b069:::100004342f9.000016af:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.319524",
            "age": 649.146817,
            "duration": 13.038095,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72249
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.319524",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.319524",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.319525",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.327928",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.327967",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.327973",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089569",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089666",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:24.089730",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:24.175469",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:04:24.357605",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.357614",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.357618",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:72256 1.38b4facd 1:b35f2d1c:::100004342f9.000016b6:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:04:11.334694",
            "age": 649.131646,
            "duration": 13.615766,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 72256
                },
                "events": [
                    {
                        "time": "2018-12-06 18:04:11.334694",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:04:11.334694",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.334696",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:04:11.340039",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:04:11.340076",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:04:11.340082",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:24.091372",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:04:24.091440",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:04:24.091490",
                        "event": "waiting for subops from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:24.359630",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:04:24.950426",
                        "event": "sub_op_commit_rec from 6"
                    },
                    {
                        "time": "2018-12-06 18:04:24.950451",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:04:24.950460",
                        "event": "done"
                    }
                ]
            }
        },
        {
            "description": "osd_op(client.447197.1:75635 1.751883c0 1:03c118ae:::100004342f9.00001a19:head [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e75)",
            "initiated_at": "2018-12-06 18:05:16.411107",
            "age": 584.055234,
            "duration": 10.290467,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.447197",
                    "client_addr": "10.0.30.110:0/3181790720",
                    "tid": 75635
                },
                "events": [
                    {
                        "time": "2018-12-06 18:05:16.411107",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-12-06 18:05:16.411107",
                        "event": "header_read"
                    },
                    {
                        "time": "2018-12-06 18:05:16.411108",
                        "event": "throttled"
                    },
                    {
                        "time": "2018-12-06 18:05:16.416864",
                        "event": "all_read"
                    },
                    {
                        "time": "2018-12-06 18:05:16.416903",
                        "event": "dispatched"
                    },
                    {
                        "time": "2018-12-06 18:05:16.416909",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-12-06 18:05:19.383296",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-12-06 18:05:19.383394",
                        "event": "started"
                    },
                    {
                        "time": "2018-12-06 18:05:19.383443",
                        "event": "waiting for subops from 5"
                    },
                    {
                        "time": "2018-12-06 18:05:19.957547",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-12-06 18:05:26.701524",
                        "event": "sub_op_commit_rec from 5"
                    },
                    {
                        "time": "2018-12-06 18:05:26.701557",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2018-12-06 18:05:26.701574",
                        "event": "done"
                    }
                ]
            }
        }
    ]
}


—
Best Regards
Li, Ning



> On Dec 6, 2018, at 14:31, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> 
> On Thu, Dec 6, 2018 at 2:24 PM Li,Ning <lining916740672@xxxxxxxxxx> wrote:
>> 
>> Hi,
>> 
>> Overload on the storage nodes that run OSDs is normal  during the whole copy.
>> 
>> The memory pressure is on the client side.  I think it may not work to turn the dirty_ratio and dirty_background_ratio to small on the ceph storage server side.
>> 
>> 
>> Anyway,I will have a try.
>> 
>> —
>> Best Regards
>> Li, Ning
>> 
>> 
>> 
>>> On Dec 6, 2018, at 11:41, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>> 
>>> On Wed, Dec 5, 2018 at 2:33 PM NingLi <lining916740672@xxxxxxxxxx> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> We found that some process writing cephfs  will hang for a long time (> 120s)  when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app node's cephfs mountpoint.
>>>> 
>>>> This problem is not always reproduciable. But when this problem occurs, the web(nginx) or some other services can’t be access, it’s embarrassing.
>>>> 
>>>> 
>>>> Please give some tips.
>>>> 
>>>> 
>>>> We are using  ceph version 13.2.1  and  3.10.0-693.el7.x86_64 kernel.
>>>> 
>>>> 
>>>> When this problem occurs, the memory pressure is a bit high.
>>>> # free -h
>>>>             total        used        free         shared  buff/cache   available
>>>> Mem:        30G       11G        230M        3.3M         18G         18G
>>>> Swap:         0B        0B          0B
>>>> 
>>>> # mpstat -P ALL
>>>> Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>>>> Average:     all         3.80       0.04   1.46    55.12    0.00    0.08    0.00    0.00    0.00   39.49
>>>> Average:       0        1.52       0.00    0.51    97.98    0.00    0.00    0.00    0.00    0.00    0.00
>>>> Average:       1        13.07      0.00   1.51    0.00      0.00    0.00    0.00    0.00    0.00   85.43
>>>> Average:       2        3.00       0.00    3.00    0.50      0.00    0.00    0.00    0.00    0.00   93.50
>>>> Average:       3        2.50       0.00    2.00    0.00      0.00    0.00    0.00    0.00    0.00   95.50
>>>> Average:       4        1.50       0.50    1.50    96.00    0.00    0.50    0.00    0.00    0.00    0.00
>>>> Average:       5       0.50       0.00    1.01    32.16     0.00    0.00    0.00    0.00    0.00   66.33
>>>> Average:       6        2.49       0.00    2.49    0.00     0.00    0.50    0.00    0.00    0.00   94.53
>>>> Average:       7       1.52       0.00    0.51     97.98    0.00    0.00    0.00    0.00    0.00    0.00
>>>> Average:       8       1.50       0.00    1.50     97.00    0.00    0.00    0.00    0.00    0.00    0.00
>>>> Average:       9       14.07     0.00    2.01     46.23    0.00    0.00    0.00    0.00    0.00   37.69
>>>> Average:      10       2.00      0.00    0.50     97.50    0.00    0.00    0.00    0.00    0.00    0.00
>>>> Average:      11       2.01      0.00    1.01     96.98    0.00    0.00    0.00    0.00    0.00    0.00
>>>> 
>>>> We can see some process in state ‘D’ for a long time.  here are 2 fragments in dmesg.
>>>> 
>>> 
>>> try setting vm.dirty_ration and vm.dirty_background_ration of machines
>>> that run OSDs to smaller value.
>>> 
>>>> [Fri Nov 30 15:10:43 2018] INFO: task java:11662 blocked for more than 120 seconds.
>>>> [Fri Nov 30 15:10:43 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [Fri Nov 30 15:10:43 2018] java            D ffffffff816a4180     0 11662  10409 0x00000180
>>>> [Fri Nov 30 15:10:43 2018]  ffff8805f4febbd0 0000000000000082 ffff8807e2edcf10 ffff8805f4febfd8
>>>> [Fri Nov 30 15:10:43 2018]  ffff8805f4febfd8 ffff8805f4febfd8 ffff8807e2edcf10 ffff88086df56cc0
>>>> [Fri Nov 30 15:10:43 2018]  0000000000000000 7fffffffffffffff ffff88087ffc3be8 ffffffff816a4180
>>>> [Fri Nov 30 15:10:43 2018] Call Trace:
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a6059>] schedule+0x29/0x70
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a3b69>] schedule_timeout+0x239/0x2c0
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81190541>] ? pagevec_lookup_tag+0x21/0x30
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffffc082b598>] ? ceph_writepages_start+0x1298/0x1520 [ceph]
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a56dd>] io_schedule_timeout+0xad/0x130
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a5778>] io_schedule+0x18/0x20
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a4191>] bit_wait_io+0x11/0x50
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816a3cb5>] __wait_on_bit+0x65/0x90
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181de1>] __filemap_fdatawait_range+0x111/0x190
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff811b0d79>] ? do_numa_page+0x159/0x1e0
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff8118f42e>] ? do_writepages+0x1e/0x40
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81181e74>] filemap_fdatawait_range+0x14/0x30
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81183db6>] filemap_write_and_wait_range+0x56/0x90
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffffc083326c>] ceph_fsync+0x6c/0x510 [ceph]
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81137273>] ? __secure_computing+0x73/0x240
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff8111f5c6>] ? __audit_syscall_exit+0x1e6/0x280
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81231d85>] do_fsync+0x65/0xa0
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff81232083>] SyS_fdatasync+0x13/0x20
>>>> [Fri Nov 30 15:10:43 2018]  [<ffffffff816b1d4e>] tracesys+0xd9/0xde
>>>> 
>>>> 
>>>> [Fri Nov 30 15:12:43 2018] INFO: task prometheus:9919 blocked for more than 120 seconds.
>>>> [Fri Nov 30 15:12:43 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [Fri Nov 30 15:12:43 2018] prometheus      D ffff8807e87dbad8     0  9919   9879 0x00000080
>>>> [Fri Nov 30 15:12:43 2018]  ffff8807e87db970 0000000000000082 ffff8803e273cf10 ffff8807e87dbfd8
>>>> [Fri Nov 30 15:12:43 2018]  ffff8807e87dbfd8 ffff8807e87dbfd8 ffff8803e273cf10 ffff88046e716cc0
>>>> [Fri Nov 30 15:12:43 2018]  0000000000000000 7fffffffffffffff ffffffff816a4180 ffff8807e87dbad8
>>>> [Fri Nov 30 15:12:43 2018] Call Trace:
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a6059>] schedule+0x29/0x70
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a3b69>] schedule_timeout+0x239/0x2c0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff81090007>] ? local_bh_enable+0x17/0x20
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a56dd>] io_schedule_timeout+0xad/0x130
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a5778>] io_schedule+0x18/0x20
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a4191>] bit_wait_io+0x11/0x50
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816a3eaf>] __wait_on_bit_lock+0x5f/0xc0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff81182094>] __lock_page+0x74/0x90
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff81182a34>] __find_lock_page+0x54/0x70
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811836e2>] grab_cache_page_write_begin+0x62/0xd0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffffc082a203>] ceph_write_begin+0x43/0xe0 [ceph]
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811824ce>] generic_file_buffered_write+0x11e/0x2a0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffffc08259c4>] ceph_aio_write+0xac4/0xc60 [ceph]
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff8160417a>] ? inet_recvmsg+0x7a/0xa0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811feb3d>] do_sync_write+0x8d/0xd0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff811ff5fd>] vfs_write+0xbd/0x1e0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff8120040f>] SyS_write+0x7f/0xe0
>>>> [Fri Nov 30 15:12:43 2018]  [<ffffffff816b1b49>] system_call_fastpath+0x16/0x1b
>>>> 
>>>> 
>>>> But the cephfs status shows healthy.
>>>> # ceph -s
>>>> cluster:
>>>>   id:     b27bc9a3-2e32-4782-b60f-b7901b0b26e5
>>>>   health: HEALTH_OK
>>>> 
>>>> services:
>>>>   mon: 2 daemons, quorum skyaxe-storage-0,skyaxe-storage-1
>>>>   mgr: skyaxe-storage-0(active), standbys: skyaxe-storage-1
>>>>   mds: cephfs-1/1/1 up  {0=skyaxe-storage-1=up:active}, 1 up:standby
>>>>   osd: 8 osds: 8 up, 8 in
>>>> 
>>>> data:
>>>>   pools:   2 pools, 512 pgs
>>>>   objects: 599.7 k objects, 876 GiB
>>>>   usage:   2.6 TiB used, 13 TiB / 15 TiB avail
>>>>   pgs:     512 active+clean
>>>> 
>>>> io:
>>>>   client:   3.7 KiB/s wr, 0 op/s rd, 1 op/s wr
>>>> 
>>>> 
>>>> We can see some unfinished requests in osdc.
>>>> 
>>>> #  cat /sys/kernel/debug/ceph/xxx/osdc
>>>> REQUESTS 98 homeless 0
>>>> 8529    osd1    1.1bfdee5       [1,7]/1 [1,7]/1 100000011be.00000000    0x400024        1       0'0     write
>>>> 8532    osd1    1.2d81c338      [1,6]/1 [1,6]/1 100000011c0.00000000    0x400024        1       0'0     write
>>>> 8536    osd1    1.49324f26      [1,4]/1 [1,4]/1 100000011c4.00000000    0x400024        1       0'0     write
>>>> 8542    osd1    1.1403106f      [1,7]/1 [1,7]/1 100000011c8.00000000    0x400024        1       0'0     write
>>>> 8549    osd1    1.ea273113      [1,4]/1 [1,4]/1 100000011ce.00000000    0x400024        1       0'0     write
>>>> 8562    osd1    1.f2d68e9a      [1,5]/1 [1,5]/1 100000011d8.00000000    0x400024        1       0'0     write
>>>> 8566    osd1    1.1f212283      [1,4]/1 [1,4]/1 100000011db.00000000    0x400024        1       0'0     write
>>>> 8568    osd1    1.2e1984e7      [1,5]/1 [1,5]/1 100000011de.00000000    0x400024        1       0'0     write
>>>> 8576    osd1    1.da7e63d0      [1,5]/1 [1,5]/1 100000011e1.00000000    0x400024        1       0'0     write
>>>> 8581    osd1    1.3b0d280       [1,4]/1 [1,4]/1 100000011e8.00000000    0x400024        1       0'0     write
>>>> 8582    osd1    1.80227f0b      [1,5]/1 [1,5]/1 100000011ea.00000000    0x400024        1       0'0     write
>>>> 8593    osd1    1.1efd9391      [1,6]/1 [1,6]/1 100000011ef.00000000    0x400024        1       0'0     write
>>>> 8594    osd1    1.3db6e880      [1,4]/1 [1,4]/1 100000011f0.00000000    0x400024        1       0'0     write
>>>> 8611    osd1    1.6d20c41d      [1,4]/1 [1,4]/1 1000000028c.00000000    0x400024        1       0'0     write
>>>> 8612    osd1    1.6d20c41d      [1,4]/1 [1,4]/1 1000000028c.00000000    0x400024        1       0'0     write
>>>> 8616    osd1    1.f4d90a4d      [1,7]/1 [1,7]/1 10000001206.00000000    0x400024        1       0'0     write
>>>> 8619    osd1    1.740514fe      [1,5]/1 [1,5]/1 10000001208.00000000    0x400024        1       0'0     write
>>>> 8630    osd1    1.d5fbabe0      [1,6]/1 [1,6]/1 10000001216.00000000    0x400024        1       0'0     write
>>>> 8631    osd1    1.daf28f83      [1,4]/1 [1,4]/1 10000001217.00000000    0x400024        1       0'0     write
>>>> 8635    osd1    1.d29e01a6      [1,5]/1 [1,5]/1 1000000121a.00000000    0x400024        1       0'0     write
>>>> 8640    osd1    1.2de0ff63      [1,7]/1 [1,7]/1 10000001220.00000000    0x400024        1       0'0     write
>>>> 8658    osd1    1.2f91206f      [1,7]/1 [1,7]/1 1000000122c.00000000    0x400064        1       0'0     write,startsync
>>>> ...
>>>> 
> 
> how long did these osd requests take? If some requests take too long,
> use 'ceph daemon osd.x dump dump_blocked_ops/dump_ops_in_flight' to
> check
> 
> 
>>> 
>>>> Here is our ceph config:
>>>> 
>>>> # cat /etc/ceph/ceph.conf
>>>> [global]
>>>> fsid = b27bc9a3-2e32-4782-b60f-b7901b0b26e5
>>>> mon_initial_members = skyaxe-storage-0, skyaxe-storage-1
>>>> mon_host = 10.0.30.111,10.0.30.112
>>>> auth_cluster_required = cephx
>>>> auth_service_required = cephx
>>>> auth_client_required = cephx
>>>> 
>>>> osd_pool_default_size = 2
>>>> public_network = 10.0.30.0/24
>>>> cluster_network = 10.0.40.0/24
>>>> osd_pool_default_min_size = 1
>>>> 
>>>> max_open_files = 131072
>>>> 
>>>> [osd]
>>>> osd_journal_size = 56320
>>>> 
>>>> journal_max_write_bytes = 1073741824
>>>> journal_max_write_entries = 10000
>>>> journal_queue_max_ops = 50000
>>>> journal_queue_max_bytes = 10737418240
>>>> 
>>>> osd_mkfs_type = xfs
>>>> osd_mkfs_options_xfs = -f
>>>> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
>>>> 
>>>> osd_max_write_size = 512
>>>> osd_client_message_size_cap = 2147483648
>>>> osd_deep_scrub_stride = 131072
>>>> osd_op_threads = 8
>>>> osd_disk_threads = 4
>>>> osd_map_cache_size = 1024
>>>> osd_map_cache_bl_size = 128
>>>> osd_recovery_op_priority = 4
>>>> osd_recovery_max_active = 10
>>>> osd_max_backfills = 4
>>>> 
>>>> —
>>>> Best Regards
>>>> Li, Ning
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux