Re: cephfs failed to rdlock, waiting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yep, that seems more likely than anything else — there are no other
running external ops to hold up a read lock, and if restarting the MDS
isn't fixing it, then it's permanent state. So, RADOS.

On Mon, Jul 25, 2016 at 7:53 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote:
> Hi Greg,
>
>
> I can see that sometimes its showing an evict (full)
>
>
>     cluster a8171427-141c-4766-9e0f-533d86dd4ef8
>      health HEALTH_WARN
>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>      monmap e1: 3 mons at
> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
>             election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>       fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby
>      osdmap e2168: 24 osds: 24 up, 24 in
>             flags noscrub,nodeep-scrub,sortbitwise
>       pgmap v3235879: 2240 pgs, 4 pools, 13308 GB data, 4615 kobjects
>             26646 GB used, 27279 GB / 53926 GB avail
>                 2238 active+clean
>                    2 active+clean+scrubbing+deep
>
>
>
>   client io 5413 kB/s rd, 384 kB/s wr, 233 op/s rd, 1547 op/s wr
>   cache io 498 MB/s evict, 563 op/s promote, 4 PG(s) evicting
>
>     cluster a8171427-141c-4766-9e0f-533d86dd4ef8
>      health HEALTH_WARN
>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>      monmap e1: 3 mons at
> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
>             election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>       fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby
>      osdmap e2168: 24 osds: 24 up, 24 in
>             flags noscrub,nodeep-scrub,sortbitwise
>       pgmap v3235917: 2240 pgs, 4 pools, 13309 GB data, 4601 kobjects
>             26649 GB used, 27277 GB / 53926 GB avail
>                 2239 active+clean
>                    1 active+clean+scrubbing+deep
>   client io 1247 kB/s rd, 439 kB/s wr, 213 op/s rd, 789 op/s wr
>   cache io 253 MB/s evict, 350 op/s promote, 1 PG(s) evicting
>
>
>
>     cluster a8171427-141c-4766-9e0f-533d86dd4ef8
>      health HEALTH_WARN
>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>      monmap e1: 3 mons at
> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
>             election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>       fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby
>      osdmap e2168: 24 osds: 24 up, 24 in
>             flags noscrub,nodeep-scrub,sortbitwise
>       pgmap v3235946: 2240 pgs, 4 pools, 13310 GB data, 4589 kobjects
>             26650 GB used, 27275 GB / 53926 GB avail
>                 2239 active+clean
>                    1 active+clean+scrubbing+deep
>   client io 0 B/s rd, 490 kB/s wr, 203 op/s rd, 1185 op/s wr
>   cache io 343 MB/s evict, 408 op/s promote, 1 PG(s) evicting, 1 PG(s)
> evicting (full)
>
> ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
>  4 3.63699  1.00000  3724G  1760G  1964G 47.26 0.96 148
>  5 3.63699  1.00000  3724G  1830G  1894G 49.14 0.99 158
>  6 3.63699  1.00000  3724G  2056G  1667G 55.23 1.12 182
>  7 3.63699  1.00000  3724G  1856G  1867G 49.86 1.01 163
> 20 2.79199  1.00000  2793G  1134G  1659G 40.60 0.82  98
> 21 2.79199  1.00000  2793G   990G  1803G 35.45 0.72  89
> 22 2.79199  1.00000  2793G  1597G  1195G 57.20 1.16 134
> 23 2.79199  1.00000  2793G  1337G  1455G 47.87 0.97 116
> 12 3.63699  1.00000  3724G  1819G  1904G 48.86 0.99 154
> 13 3.63699  1.00000  3724G  1681G  2042G 45.16 0.91 144
> 14 3.63699  1.00000  3724G  1892G  1832G 50.80 1.03 165
> 15 3.63699  1.00000  3724G  1494G  2229G 40.14 0.81 132
> 16 2.79199  1.00000  2793G  1375G  1418G 49.23 1.00 121
> 17 2.79199  1.00000  2793G  1444G  1348G 51.71 1.05 127
> 18 2.79199  1.00000  2793G  1509G  1283G 54.04 1.09 129
> 19 2.79199  1.00000  2793G  1345G  1447G 48.19 0.97 116
>  0 0.21799  1.00000   223G   180G 44268M 80.65 1.63 269
>  1 0.21799  1.00000   223G   201G 22758M 90.05 1.82 303
>  2 0.21799  1.00000   223G   182G 42246M 81.54 1.65 284
>  3 0.21799  1.00000   223G   200G 23599M 89.69 1.81 296
>  8 0.21799  1.00000   223G   177G 46963M 79.48 1.61 272
>  9 0.21799  1.00000   223G   203G 20730M 90.94 1.84 307
> 10 0.21799  1.00000   223G   190G 34104M 85.10 1.72 288
> 11 0.21799  1.00000   223G   193G 31155M 86.38 1.75 285
>               TOTAL 53926G 26654G 27272G 49.43
> MIN/MAX VAR: 0.72/1.84  STDDEV: 21.46
>
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:info@xxxxxxxxxxxxxxxxx
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 26.07.2016 um 04:47 schrieb Gregory Farnum:
>> On Mon, Jul 25, 2016 at 7:38 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote:
>>> Hi,
>>>
>>> currently some productive stuff is down, because it can not be accessed
>>> through cephfs.
>>>
>>> Client server restart, did not help.
>>> Cluster restart, did not help.
>>>
>>> Only ONE directory inside cephfs has this issue.
>>>
>>> All other directories are working fine.
>>
>> What's the full output of "ceph -s"?
>>
>>>
>>>
>>> MDS Server: Kernel 4.5.4
>>> client server: Kernel 4.5.4
>>> ceph version 10.2.2
>>>
>>> # ceph fs dump
>>> dumped fsmap epoch 92
>>> e92
>>> enable_multiple, ever_enabled_multiple: 0,0
>>> compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
>>> layout v2}
>>>
>>> Filesystem 'ceph-gen2' (2)
>>> fs_name ceph-gen2
>>> epoch   92
>>> flags   0
>>> created 2016-06-11 21:53:02.142649
>>> modified        2016-06-14 11:09:16.783356
>>> tableserver     0
>>> root    0
>>> session_timeout 60
>>> session_autoclose       300
>>> max_file_size   1099511627776
>>> last_failure    0
>>> last_failure_osd_epoch  2164
>>> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
>>> layout v2}
>>> max_mds 1
>>> in      0
>>> up      {0=234109}
>>> failed
>>> damaged
>>> stopped
>>> data_pools      4
>>> metadata_pool   5
>>> inline_data     disabled
>>> 234109: 10.0.0.11:6801/22255 'cephmon1' mds.0.89 up:active seq 250
>>>
>>>
>>> Standby daemons:
>>>
>>> 204171: 10.0.0.13:6800/19434 'cephmon3' mds.-1.0 up:standby seq 1
>>>
>>>
>>> ceph --admin-daemon ceph-mds.cephmon1.asok dump_ops_in_flight
>>> {
>>>     "ops": [
>>>         {
>>>             "description": "client_request(client.204153:432 getattr
>>> pAsLsXsFs #10000001432 2016-07-25 21:57:30.697894 RETRY=2)",
>>>             "initiated_at": "2016-07-26 04:24:05.528832",
>>>             "age": 816.092461,
>>>             "duration": 816.092528,
>>>             "type_data": [
>>>                 "failed to rdlock, waiting",
>>>                 "client.204153:432",
>>>                 "client_request",
>>>                 {
>>>                     "client": "client.204153",
>>>                     "tid": 432
>>>                 },
>>>                 [
>>>                     {
>>>                         "time": "2016-07-26 04:24:05.528832",
>>>                         "event": "initiated"
>>>                     },
>>>                     {
>>>                         "time": "2016-07-26 04:24:07.613779",
>>>                         "event": "failed to rdlock, waiting"
>>>                     }
>>>                 ]
>>>             ]
>>>         }
>>>     ],
>>>     "num_ops": 1
>>> }
>>>
>>>
>>> 2016-07-26 04:32:09.355503 7ffb331ca700  0 log_channel(cluster) log
>>> [WRN] : 1 slow requests, 1 included below; oldest blocked for >
>>> 483.826590 secs
>>>
>>> 2016-07-26 04:32:09.355531 7ffb331ca700  0 log_channel(cluster) log
>>> [WRN] : slow request 483.826590 seconds old, received at 2016-07-26
>>> 04:24:05.528832: client_request(client.204153:432 getattr pAsLsXsFs
>>> #10000001432 2016-07-25 21:57:30.697894 RETRY=2) currently failed to
>>> rdlock, waiting
>>>
>>>
>>> Any idea ? :(
>>>
>>> --
>>> Mit freundlichen Gruessen / Best regards
>>>
>>> Oliver Dzombic
>>> IP-Interactive
>>>
>>> mailto:info@xxxxxxxxxxxxxxxxx
>>>
>>> Anschrift:
>>>
>>> IP Interactive UG ( haftungsbeschraenkt )
>>> Zum Sonnenberg 1-3
>>> 63571 Gelnhausen
>>>
>>> HRB 93402 beim Amtsgericht Hanau
>>> Geschäftsführung: Oliver Dzombic
>>>
>>> Steuer Nr.: 35 236 3622 1
>>> UST ID: DE274086107
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux