Hi Greg, i switched the cache tier to forward, and began to evit everything. I restarted the mds, it was switching to another node. Still the same issue... So how can it be a pg full issue this way ? cluster a8171427-141c-4766-9e0f-533d86dd4ef8 health HEALTH_OK monmap e1: 3 mons at {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 fsmap e98: 1/1/1 up {0=cephmon3=up:active}, 1 up:standby osdmap e2173: 24 osds: 24 up, 24 in flags sortbitwise pgmap v3238008: 2240 pgs, 4 pools, 13228 GB data, 3899 kobjects 26487 GB used, 27439 GB / 53926 GB avail 2233 active+clean 5 active+clean+scrubbing+deep 2 active+clean+scrubbing client io 0 B/s rd, 7997 kB/s wr, 24 op/s rd, 70 op/s wr cache io 1980 kB/s evict # ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 4 3.63699 1.00000 3724G 1760G 1964G 47.26 0.96 148 5 3.63699 1.00000 3724G 1830G 1894G 49.14 1.00 158 6 3.63699 1.00000 3724G 2056G 1667G 55.23 1.12 182 7 3.63699 1.00000 3724G 1856G 1867G 49.86 1.02 163 20 2.79199 1.00000 2793G 1134G 1659G 40.60 0.83 98 21 2.79199 1.00000 2793G 990G 1803G 35.45 0.72 89 22 2.79199 1.00000 2793G 1597G 1195G 57.20 1.16 134 23 2.79199 1.00000 2793G 1337G 1455G 47.87 0.97 116 12 3.63699 1.00000 3724G 1819G 1904G 48.86 0.99 154 13 3.63699 1.00000 3724G 1681G 2042G 45.16 0.92 144 14 3.63699 1.00000 3724G 1892G 1832G 50.80 1.03 165 15 3.63699 1.00000 3724G 1494G 2229G 40.14 0.82 132 16 2.79199 1.00000 2793G 1375G 1418G 49.23 1.00 121 17 2.79199 1.00000 2793G 1444G 1348G 51.71 1.05 127 18 2.79199 1.00000 2793G 1509G 1283G 54.04 1.10 129 19 2.79199 1.00000 2793G 1345G 1447G 48.19 0.98 116 0 0.21799 1.00000 223G 158G 66268M 71.04 1.45 269 1 0.21799 1.00000 223G 181G 43363M 81.05 1.65 303 2 0.21799 1.00000 223G 166G 57845M 74.72 1.52 284 3 0.21799 1.00000 223G 172G 52129M 77.22 1.57 296 8 0.21799 1.00000 223G 159G 65453M 71.40 1.45 272 9 0.21799 1.00000 223G 187G 37270M 83.71 1.70 307 10 0.21799 1.00000 223G 169G 55478M 75.75 1.54 288 11 0.21799 1.00000 223G 163G 61722M 73.03 1.49 285 TOTAL 53926G 26484G 27442G 49.11 MIN/MAX VAR: 0.72/1.70 STDDEV: 16.36 -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 26.07.2016 um 04:56 schrieb Gregory Farnum: > Yep, that seems more likely than anything else — there are no other > running external ops to hold up a read lock, and if restarting the MDS > isn't fixing it, then it's permanent state. So, RADOS. > > On Mon, Jul 25, 2016 at 7:53 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote: >> Hi Greg, >> >> >> I can see that sometimes its showing an evict (full) >> >> >> cluster a8171427-141c-4766-9e0f-533d86dd4ef8 >> health HEALTH_WARN >> noscrub,nodeep-scrub,sortbitwise flag(s) set >> monmap e1: 3 mons at >> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} >> election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 >> fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby >> osdmap e2168: 24 osds: 24 up, 24 in >> flags noscrub,nodeep-scrub,sortbitwise >> pgmap v3235879: 2240 pgs, 4 pools, 13308 GB data, 4615 kobjects >> 26646 GB used, 27279 GB / 53926 GB avail >> 2238 active+clean >> 2 active+clean+scrubbing+deep >> >> >> >> client io 5413 kB/s rd, 384 kB/s wr, 233 op/s rd, 1547 op/s wr >> cache io 498 MB/s evict, 563 op/s promote, 4 PG(s) evicting >> >> cluster a8171427-141c-4766-9e0f-533d86dd4ef8 >> health HEALTH_WARN >> noscrub,nodeep-scrub,sortbitwise flag(s) set >> monmap e1: 3 mons at >> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} >> election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 >> fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby >> osdmap e2168: 24 osds: 24 up, 24 in >> flags noscrub,nodeep-scrub,sortbitwise >> pgmap v3235917: 2240 pgs, 4 pools, 13309 GB data, 4601 kobjects >> 26649 GB used, 27277 GB / 53926 GB avail >> 2239 active+clean >> 1 active+clean+scrubbing+deep >> client io 1247 kB/s rd, 439 kB/s wr, 213 op/s rd, 789 op/s wr >> cache io 253 MB/s evict, 350 op/s promote, 1 PG(s) evicting >> >> >> >> cluster a8171427-141c-4766-9e0f-533d86dd4ef8 >> health HEALTH_WARN >> noscrub,nodeep-scrub,sortbitwise flag(s) set >> monmap e1: 3 mons at >> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} >> election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 >> fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby >> osdmap e2168: 24 osds: 24 up, 24 in >> flags noscrub,nodeep-scrub,sortbitwise >> pgmap v3235946: 2240 pgs, 4 pools, 13310 GB data, 4589 kobjects >> 26650 GB used, 27275 GB / 53926 GB avail >> 2239 active+clean >> 1 active+clean+scrubbing+deep >> client io 0 B/s rd, 490 kB/s wr, 203 op/s rd, 1185 op/s wr >> cache io 343 MB/s evict, 408 op/s promote, 1 PG(s) evicting, 1 PG(s) >> evicting (full) >> >> ceph osd df >> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> 4 3.63699 1.00000 3724G 1760G 1964G 47.26 0.96 148 >> 5 3.63699 1.00000 3724G 1830G 1894G 49.14 0.99 158 >> 6 3.63699 1.00000 3724G 2056G 1667G 55.23 1.12 182 >> 7 3.63699 1.00000 3724G 1856G 1867G 49.86 1.01 163 >> 20 2.79199 1.00000 2793G 1134G 1659G 40.60 0.82 98 >> 21 2.79199 1.00000 2793G 990G 1803G 35.45 0.72 89 >> 22 2.79199 1.00000 2793G 1597G 1195G 57.20 1.16 134 >> 23 2.79199 1.00000 2793G 1337G 1455G 47.87 0.97 116 >> 12 3.63699 1.00000 3724G 1819G 1904G 48.86 0.99 154 >> 13 3.63699 1.00000 3724G 1681G 2042G 45.16 0.91 144 >> 14 3.63699 1.00000 3724G 1892G 1832G 50.80 1.03 165 >> 15 3.63699 1.00000 3724G 1494G 2229G 40.14 0.81 132 >> 16 2.79199 1.00000 2793G 1375G 1418G 49.23 1.00 121 >> 17 2.79199 1.00000 2793G 1444G 1348G 51.71 1.05 127 >> 18 2.79199 1.00000 2793G 1509G 1283G 54.04 1.09 129 >> 19 2.79199 1.00000 2793G 1345G 1447G 48.19 0.97 116 >> 0 0.21799 1.00000 223G 180G 44268M 80.65 1.63 269 >> 1 0.21799 1.00000 223G 201G 22758M 90.05 1.82 303 >> 2 0.21799 1.00000 223G 182G 42246M 81.54 1.65 284 >> 3 0.21799 1.00000 223G 200G 23599M 89.69 1.81 296 >> 8 0.21799 1.00000 223G 177G 46963M 79.48 1.61 272 >> 9 0.21799 1.00000 223G 203G 20730M 90.94 1.84 307 >> 10 0.21799 1.00000 223G 190G 34104M 85.10 1.72 288 >> 11 0.21799 1.00000 223G 193G 31155M 86.38 1.75 285 >> TOTAL 53926G 26654G 27272G 49.43 >> MIN/MAX VAR: 0.72/1.84 STDDEV: 21.46 >> >> >> -- >> Mit freundlichen Gruessen / Best regards >> >> Oliver Dzombic >> IP-Interactive >> >> mailto:info@xxxxxxxxxxxxxxxxx >> >> Anschrift: >> >> IP Interactive UG ( haftungsbeschraenkt ) >> Zum Sonnenberg 1-3 >> 63571 Gelnhausen >> >> HRB 93402 beim Amtsgericht Hanau >> Geschäftsführung: Oliver Dzombic >> >> Steuer Nr.: 35 236 3622 1 >> UST ID: DE274086107 >> >> >> Am 26.07.2016 um 04:47 schrieb Gregory Farnum: >>> On Mon, Jul 25, 2016 at 7:38 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote: >>>> Hi, >>>> >>>> currently some productive stuff is down, because it can not be accessed >>>> through cephfs. >>>> >>>> Client server restart, did not help. >>>> Cluster restart, did not help. >>>> >>>> Only ONE directory inside cephfs has this issue. >>>> >>>> All other directories are working fine. >>> >>> What's the full output of "ceph -s"? >>> >>>> >>>> >>>> MDS Server: Kernel 4.5.4 >>>> client server: Kernel 4.5.4 >>>> ceph version 10.2.2 >>>> >>>> # ceph fs dump >>>> dumped fsmap epoch 92 >>>> e92 >>>> enable_multiple, ever_enabled_multiple: 0,0 >>>> compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>>> ranges,3=default file layouts on dirs,4=dir inode in separate >>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file >>>> layout v2} >>>> >>>> Filesystem 'ceph-gen2' (2) >>>> fs_name ceph-gen2 >>>> epoch 92 >>>> flags 0 >>>> created 2016-06-11 21:53:02.142649 >>>> modified 2016-06-14 11:09:16.783356 >>>> tableserver 0 >>>> root 0 >>>> session_timeout 60 >>>> session_autoclose 300 >>>> max_file_size 1099511627776 >>>> last_failure 0 >>>> last_failure_osd_epoch 2164 >>>> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>>> ranges,3=default file layouts on dirs,4=dir inode in separate >>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file >>>> layout v2} >>>> max_mds 1 >>>> in 0 >>>> up {0=234109} >>>> failed >>>> damaged >>>> stopped >>>> data_pools 4 >>>> metadata_pool 5 >>>> inline_data disabled >>>> 234109: 10.0.0.11:6801/22255 'cephmon1' mds.0.89 up:active seq 250 >>>> >>>> >>>> Standby daemons: >>>> >>>> 204171: 10.0.0.13:6800/19434 'cephmon3' mds.-1.0 up:standby seq 1 >>>> >>>> >>>> ceph --admin-daemon ceph-mds.cephmon1.asok dump_ops_in_flight >>>> { >>>> "ops": [ >>>> { >>>> "description": "client_request(client.204153:432 getattr >>>> pAsLsXsFs #10000001432 2016-07-25 21:57:30.697894 RETRY=2)", >>>> "initiated_at": "2016-07-26 04:24:05.528832", >>>> "age": 816.092461, >>>> "duration": 816.092528, >>>> "type_data": [ >>>> "failed to rdlock, waiting", >>>> "client.204153:432", >>>> "client_request", >>>> { >>>> "client": "client.204153", >>>> "tid": 432 >>>> }, >>>> [ >>>> { >>>> "time": "2016-07-26 04:24:05.528832", >>>> "event": "initiated" >>>> }, >>>> { >>>> "time": "2016-07-26 04:24:07.613779", >>>> "event": "failed to rdlock, waiting" >>>> } >>>> ] >>>> ] >>>> } >>>> ], >>>> "num_ops": 1 >>>> } >>>> >>>> >>>> 2016-07-26 04:32:09.355503 7ffb331ca700 0 log_channel(cluster) log >>>> [WRN] : 1 slow requests, 1 included below; oldest blocked for > >>>> 483.826590 secs >>>> >>>> 2016-07-26 04:32:09.355531 7ffb331ca700 0 log_channel(cluster) log >>>> [WRN] : slow request 483.826590 seconds old, received at 2016-07-26 >>>> 04:24:05.528832: client_request(client.204153:432 getattr pAsLsXsFs >>>> #10000001432 2016-07-25 21:57:30.697894 RETRY=2) currently failed to >>>> rdlock, waiting >>>> >>>> >>>> Any idea ? :( >>>> >>>> -- >>>> Mit freundlichen Gruessen / Best regards >>>> >>>> Oliver Dzombic >>>> IP-Interactive >>>> >>>> mailto:info@xxxxxxxxxxxxxxxxx >>>> >>>> Anschrift: >>>> >>>> IP Interactive UG ( haftungsbeschraenkt ) >>>> Zum Sonnenberg 1-3 >>>> 63571 Gelnhausen >>>> >>>> HRB 93402 beim Amtsgericht Hanau >>>> Geschäftsführung: Oliver Dzombic >>>> >>>> Steuer Nr.: 35 236 3622 1 >>>> UST ID: DE274086107 >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com