Re: cephfs failed to rdlock, waiting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Greg,

i switched the cache tier to forward, and began to evit everything.

I restarted the mds, it was switching to another node.

Still the same issue...

So how can it be a pg full issue this way ?


    cluster a8171427-141c-4766-9e0f-533d86dd4ef8
     health HEALTH_OK
     monmap e1: 3 mons at
{cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
            election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3
      fsmap e98: 1/1/1 up {0=cephmon3=up:active}, 1 up:standby
     osdmap e2173: 24 osds: 24 up, 24 in
            flags sortbitwise
      pgmap v3238008: 2240 pgs, 4 pools, 13228 GB data, 3899 kobjects
            26487 GB used, 27439 GB / 53926 GB avail
                2233 active+clean
                   5 active+clean+scrubbing+deep
                   2 active+clean+scrubbing
  client io 0 B/s rd, 7997 kB/s wr, 24 op/s rd, 70 op/s wr
  cache io 1980 kB/s evict

# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
 4 3.63699  1.00000  3724G  1760G  1964G 47.26 0.96 148
 5 3.63699  1.00000  3724G  1830G  1894G 49.14 1.00 158
 6 3.63699  1.00000  3724G  2056G  1667G 55.23 1.12 182
 7 3.63699  1.00000  3724G  1856G  1867G 49.86 1.02 163
20 2.79199  1.00000  2793G  1134G  1659G 40.60 0.83  98
21 2.79199  1.00000  2793G   990G  1803G 35.45 0.72  89
22 2.79199  1.00000  2793G  1597G  1195G 57.20 1.16 134
23 2.79199  1.00000  2793G  1337G  1455G 47.87 0.97 116
12 3.63699  1.00000  3724G  1819G  1904G 48.86 0.99 154
13 3.63699  1.00000  3724G  1681G  2042G 45.16 0.92 144
14 3.63699  1.00000  3724G  1892G  1832G 50.80 1.03 165
15 3.63699  1.00000  3724G  1494G  2229G 40.14 0.82 132
16 2.79199  1.00000  2793G  1375G  1418G 49.23 1.00 121
17 2.79199  1.00000  2793G  1444G  1348G 51.71 1.05 127
18 2.79199  1.00000  2793G  1509G  1283G 54.04 1.10 129
19 2.79199  1.00000  2793G  1345G  1447G 48.19 0.98 116
 0 0.21799  1.00000   223G   158G 66268M 71.04 1.45 269
 1 0.21799  1.00000   223G   181G 43363M 81.05 1.65 303
 2 0.21799  1.00000   223G   166G 57845M 74.72 1.52 284
 3 0.21799  1.00000   223G   172G 52129M 77.22 1.57 296
 8 0.21799  1.00000   223G   159G 65453M 71.40 1.45 272
 9 0.21799  1.00000   223G   187G 37270M 83.71 1.70 307
10 0.21799  1.00000   223G   169G 55478M 75.75 1.54 288
11 0.21799  1.00000   223G   163G 61722M 73.03 1.49 285
              TOTAL 53926G 26484G 27442G 49.11
MIN/MAX VAR: 0.72/1.70  STDDEV: 16.36



-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 26.07.2016 um 04:56 schrieb Gregory Farnum:
> Yep, that seems more likely than anything else — there are no other
> running external ops to hold up a read lock, and if restarting the MDS
> isn't fixing it, then it's permanent state. So, RADOS.
> 
> On Mon, Jul 25, 2016 at 7:53 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote:
>> Hi Greg,
>>
>>
>> I can see that sometimes its showing an evict (full)
>>
>>
>>     cluster a8171427-141c-4766-9e0f-533d86dd4ef8
>>      health HEALTH_WARN
>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>      monmap e1: 3 mons at
>> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
>>             election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>>       fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby
>>      osdmap e2168: 24 osds: 24 up, 24 in
>>             flags noscrub,nodeep-scrub,sortbitwise
>>       pgmap v3235879: 2240 pgs, 4 pools, 13308 GB data, 4615 kobjects
>>             26646 GB used, 27279 GB / 53926 GB avail
>>                 2238 active+clean
>>                    2 active+clean+scrubbing+deep
>>
>>
>>
>>   client io 5413 kB/s rd, 384 kB/s wr, 233 op/s rd, 1547 op/s wr
>>   cache io 498 MB/s evict, 563 op/s promote, 4 PG(s) evicting
>>
>>     cluster a8171427-141c-4766-9e0f-533d86dd4ef8
>>      health HEALTH_WARN
>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>      monmap e1: 3 mons at
>> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
>>             election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>>       fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby
>>      osdmap e2168: 24 osds: 24 up, 24 in
>>             flags noscrub,nodeep-scrub,sortbitwise
>>       pgmap v3235917: 2240 pgs, 4 pools, 13309 GB data, 4601 kobjects
>>             26649 GB used, 27277 GB / 53926 GB avail
>>                 2239 active+clean
>>                    1 active+clean+scrubbing+deep
>>   client io 1247 kB/s rd, 439 kB/s wr, 213 op/s rd, 789 op/s wr
>>   cache io 253 MB/s evict, 350 op/s promote, 1 PG(s) evicting
>>
>>
>>
>>     cluster a8171427-141c-4766-9e0f-533d86dd4ef8
>>      health HEALTH_WARN
>>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>>      monmap e1: 3 mons at
>> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
>>             election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>>       fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby
>>      osdmap e2168: 24 osds: 24 up, 24 in
>>             flags noscrub,nodeep-scrub,sortbitwise
>>       pgmap v3235946: 2240 pgs, 4 pools, 13310 GB data, 4589 kobjects
>>             26650 GB used, 27275 GB / 53926 GB avail
>>                 2239 active+clean
>>                    1 active+clean+scrubbing+deep
>>   client io 0 B/s rd, 490 kB/s wr, 203 op/s rd, 1185 op/s wr
>>   cache io 343 MB/s evict, 408 op/s promote, 1 PG(s) evicting, 1 PG(s)
>> evicting (full)
>>
>> ceph osd df
>> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
>>  4 3.63699  1.00000  3724G  1760G  1964G 47.26 0.96 148
>>  5 3.63699  1.00000  3724G  1830G  1894G 49.14 0.99 158
>>  6 3.63699  1.00000  3724G  2056G  1667G 55.23 1.12 182
>>  7 3.63699  1.00000  3724G  1856G  1867G 49.86 1.01 163
>> 20 2.79199  1.00000  2793G  1134G  1659G 40.60 0.82  98
>> 21 2.79199  1.00000  2793G   990G  1803G 35.45 0.72  89
>> 22 2.79199  1.00000  2793G  1597G  1195G 57.20 1.16 134
>> 23 2.79199  1.00000  2793G  1337G  1455G 47.87 0.97 116
>> 12 3.63699  1.00000  3724G  1819G  1904G 48.86 0.99 154
>> 13 3.63699  1.00000  3724G  1681G  2042G 45.16 0.91 144
>> 14 3.63699  1.00000  3724G  1892G  1832G 50.80 1.03 165
>> 15 3.63699  1.00000  3724G  1494G  2229G 40.14 0.81 132
>> 16 2.79199  1.00000  2793G  1375G  1418G 49.23 1.00 121
>> 17 2.79199  1.00000  2793G  1444G  1348G 51.71 1.05 127
>> 18 2.79199  1.00000  2793G  1509G  1283G 54.04 1.09 129
>> 19 2.79199  1.00000  2793G  1345G  1447G 48.19 0.97 116
>>  0 0.21799  1.00000   223G   180G 44268M 80.65 1.63 269
>>  1 0.21799  1.00000   223G   201G 22758M 90.05 1.82 303
>>  2 0.21799  1.00000   223G   182G 42246M 81.54 1.65 284
>>  3 0.21799  1.00000   223G   200G 23599M 89.69 1.81 296
>>  8 0.21799  1.00000   223G   177G 46963M 79.48 1.61 272
>>  9 0.21799  1.00000   223G   203G 20730M 90.94 1.84 307
>> 10 0.21799  1.00000   223G   190G 34104M 85.10 1.72 288
>> 11 0.21799  1.00000   223G   193G 31155M 86.38 1.75 285
>>               TOTAL 53926G 26654G 27272G 49.43
>> MIN/MAX VAR: 0.72/1.84  STDDEV: 21.46
>>
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:info@xxxxxxxxxxxxxxxxx
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 26.07.2016 um 04:47 schrieb Gregory Farnum:
>>> On Mon, Jul 25, 2016 at 7:38 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> currently some productive stuff is down, because it can not be accessed
>>>> through cephfs.
>>>>
>>>> Client server restart, did not help.
>>>> Cluster restart, did not help.
>>>>
>>>> Only ONE directory inside cephfs has this issue.
>>>>
>>>> All other directories are working fine.
>>>
>>> What's the full output of "ceph -s"?
>>>
>>>>
>>>>
>>>> MDS Server: Kernel 4.5.4
>>>> client server: Kernel 4.5.4
>>>> ceph version 10.2.2
>>>>
>>>> # ceph fs dump
>>>> dumped fsmap epoch 92
>>>> e92
>>>> enable_multiple, ever_enabled_multiple: 0,0
>>>> compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
>>>> layout v2}
>>>>
>>>> Filesystem 'ceph-gen2' (2)
>>>> fs_name ceph-gen2
>>>> epoch   92
>>>> flags   0
>>>> created 2016-06-11 21:53:02.142649
>>>> modified        2016-06-14 11:09:16.783356
>>>> tableserver     0
>>>> root    0
>>>> session_timeout 60
>>>> session_autoclose       300
>>>> max_file_size   1099511627776
>>>> last_failure    0
>>>> last_failure_osd_epoch  2164
>>>> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
>>>> layout v2}
>>>> max_mds 1
>>>> in      0
>>>> up      {0=234109}
>>>> failed
>>>> damaged
>>>> stopped
>>>> data_pools      4
>>>> metadata_pool   5
>>>> inline_data     disabled
>>>> 234109: 10.0.0.11:6801/22255 'cephmon1' mds.0.89 up:active seq 250
>>>>
>>>>
>>>> Standby daemons:
>>>>
>>>> 204171: 10.0.0.13:6800/19434 'cephmon3' mds.-1.0 up:standby seq 1
>>>>
>>>>
>>>> ceph --admin-daemon ceph-mds.cephmon1.asok dump_ops_in_flight
>>>> {
>>>>     "ops": [
>>>>         {
>>>>             "description": "client_request(client.204153:432 getattr
>>>> pAsLsXsFs #10000001432 2016-07-25 21:57:30.697894 RETRY=2)",
>>>>             "initiated_at": "2016-07-26 04:24:05.528832",
>>>>             "age": 816.092461,
>>>>             "duration": 816.092528,
>>>>             "type_data": [
>>>>                 "failed to rdlock, waiting",
>>>>                 "client.204153:432",
>>>>                 "client_request",
>>>>                 {
>>>>                     "client": "client.204153",
>>>>                     "tid": 432
>>>>                 },
>>>>                 [
>>>>                     {
>>>>                         "time": "2016-07-26 04:24:05.528832",
>>>>                         "event": "initiated"
>>>>                     },
>>>>                     {
>>>>                         "time": "2016-07-26 04:24:07.613779",
>>>>                         "event": "failed to rdlock, waiting"
>>>>                     }
>>>>                 ]
>>>>             ]
>>>>         }
>>>>     ],
>>>>     "num_ops": 1
>>>> }
>>>>
>>>>
>>>> 2016-07-26 04:32:09.355503 7ffb331ca700  0 log_channel(cluster) log
>>>> [WRN] : 1 slow requests, 1 included below; oldest blocked for >
>>>> 483.826590 secs
>>>>
>>>> 2016-07-26 04:32:09.355531 7ffb331ca700  0 log_channel(cluster) log
>>>> [WRN] : slow request 483.826590 seconds old, received at 2016-07-26
>>>> 04:24:05.528832: client_request(client.204153:432 getattr pAsLsXsFs
>>>> #10000001432 2016-07-25 21:57:30.697894 RETRY=2) currently failed to
>>>> rdlock, waiting
>>>>
>>>>
>>>> Any idea ? :(
>>>>
>>>> --
>>>> Mit freundlichen Gruessen / Best regards
>>>>
>>>> Oliver Dzombic
>>>> IP-Interactive
>>>>
>>>> mailto:info@xxxxxxxxxxxxxxxxx
>>>>
>>>> Anschrift:
>>>>
>>>> IP Interactive UG ( haftungsbeschraenkt )
>>>> Zum Sonnenberg 1-3
>>>> 63571 Gelnhausen
>>>>
>>>> HRB 93402 beim Amtsgericht Hanau
>>>> Geschäftsführung: Oliver Dzombic
>>>>
>>>> Steuer Nr.: 35 236 3622 1
>>>> UST ID: DE274086107
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux