On Tue, Jul 26, 2016 at 4:30 AM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote: > Hi Greg, > > i switched the cache tier to forward, and began to evit everything. > > I restarted the mds, it was switching to another node. > > Still the same issue... > > So how can it be a pg full issue this way ? Have a look at "ceph daemon mds.<id> objecter_requests" while it is stuck to see if there are OSD requests that are stuck from the MDS. John > > > cluster a8171427-141c-4766-9e0f-533d86dd4ef8 > health HEALTH_OK > monmap e1: 3 mons at > {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} > election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 > fsmap e98: 1/1/1 up {0=cephmon3=up:active}, 1 up:standby > osdmap e2173: 24 osds: 24 up, 24 in > flags sortbitwise > pgmap v3238008: 2240 pgs, 4 pools, 13228 GB data, 3899 kobjects > 26487 GB used, 27439 GB / 53926 GB avail > 2233 active+clean > 5 active+clean+scrubbing+deep > 2 active+clean+scrubbing > client io 0 B/s rd, 7997 kB/s wr, 24 op/s rd, 70 op/s wr > cache io 1980 kB/s evict > > # ceph osd df > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 4 3.63699 1.00000 3724G 1760G 1964G 47.26 0.96 148 > 5 3.63699 1.00000 3724G 1830G 1894G 49.14 1.00 158 > 6 3.63699 1.00000 3724G 2056G 1667G 55.23 1.12 182 > 7 3.63699 1.00000 3724G 1856G 1867G 49.86 1.02 163 > 20 2.79199 1.00000 2793G 1134G 1659G 40.60 0.83 98 > 21 2.79199 1.00000 2793G 990G 1803G 35.45 0.72 89 > 22 2.79199 1.00000 2793G 1597G 1195G 57.20 1.16 134 > 23 2.79199 1.00000 2793G 1337G 1455G 47.87 0.97 116 > 12 3.63699 1.00000 3724G 1819G 1904G 48.86 0.99 154 > 13 3.63699 1.00000 3724G 1681G 2042G 45.16 0.92 144 > 14 3.63699 1.00000 3724G 1892G 1832G 50.80 1.03 165 > 15 3.63699 1.00000 3724G 1494G 2229G 40.14 0.82 132 > 16 2.79199 1.00000 2793G 1375G 1418G 49.23 1.00 121 > 17 2.79199 1.00000 2793G 1444G 1348G 51.71 1.05 127 > 18 2.79199 1.00000 2793G 1509G 1283G 54.04 1.10 129 > 19 2.79199 1.00000 2793G 1345G 1447G 48.19 0.98 116 > 0 0.21799 1.00000 223G 158G 66268M 71.04 1.45 269 > 1 0.21799 1.00000 223G 181G 43363M 81.05 1.65 303 > 2 0.21799 1.00000 223G 166G 57845M 74.72 1.52 284 > 3 0.21799 1.00000 223G 172G 52129M 77.22 1.57 296 > 8 0.21799 1.00000 223G 159G 65453M 71.40 1.45 272 > 9 0.21799 1.00000 223G 187G 37270M 83.71 1.70 307 > 10 0.21799 1.00000 223G 169G 55478M 75.75 1.54 288 > 11 0.21799 1.00000 223G 163G 61722M 73.03 1.49 285 > TOTAL 53926G 26484G 27442G 49.11 > MIN/MAX VAR: 0.72/1.70 STDDEV: 16.36 > > > > -- > Mit freundlichen Gruessen / Best regards > > Oliver Dzombic > IP-Interactive > > mailto:info@xxxxxxxxxxxxxxxxx > > Anschrift: > > IP Interactive UG ( haftungsbeschraenkt ) > Zum Sonnenberg 1-3 > 63571 Gelnhausen > > HRB 93402 beim Amtsgericht Hanau > Geschäftsführung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 26.07.2016 um 04:56 schrieb Gregory Farnum: >> Yep, that seems more likely than anything else — there are no other >> running external ops to hold up a read lock, and if restarting the MDS >> isn't fixing it, then it's permanent state. So, RADOS. >> >> On Mon, Jul 25, 2016 at 7:53 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote: >>> Hi Greg, >>> >>> >>> I can see that sometimes its showing an evict (full) >>> >>> >>> cluster a8171427-141c-4766-9e0f-533d86dd4ef8 >>> health HEALTH_WARN >>> noscrub,nodeep-scrub,sortbitwise flag(s) set >>> monmap e1: 3 mons at >>> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} >>> election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 >>> fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby >>> osdmap e2168: 24 osds: 24 up, 24 in >>> flags noscrub,nodeep-scrub,sortbitwise >>> pgmap v3235879: 2240 pgs, 4 pools, 13308 GB data, 4615 kobjects >>> 26646 GB used, 27279 GB / 53926 GB avail >>> 2238 active+clean >>> 2 active+clean+scrubbing+deep >>> >>> >>> >>> client io 5413 kB/s rd, 384 kB/s wr, 233 op/s rd, 1547 op/s wr >>> cache io 498 MB/s evict, 563 op/s promote, 4 PG(s) evicting >>> >>> cluster a8171427-141c-4766-9e0f-533d86dd4ef8 >>> health HEALTH_WARN >>> noscrub,nodeep-scrub,sortbitwise flag(s) set >>> monmap e1: 3 mons at >>> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} >>> election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 >>> fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby >>> osdmap e2168: 24 osds: 24 up, 24 in >>> flags noscrub,nodeep-scrub,sortbitwise >>> pgmap v3235917: 2240 pgs, 4 pools, 13309 GB data, 4601 kobjects >>> 26649 GB used, 27277 GB / 53926 GB avail >>> 2239 active+clean >>> 1 active+clean+scrubbing+deep >>> client io 1247 kB/s rd, 439 kB/s wr, 213 op/s rd, 789 op/s wr >>> cache io 253 MB/s evict, 350 op/s promote, 1 PG(s) evicting >>> >>> >>> >>> cluster a8171427-141c-4766-9e0f-533d86dd4ef8 >>> health HEALTH_WARN >>> noscrub,nodeep-scrub,sortbitwise flag(s) set >>> monmap e1: 3 mons at >>> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} >>> election epoch 126, quorum 0,1,2 cephmon1,cephmon2,cephmon3 >>> fsmap e92: 1/1/1 up {0=cephmon1=up:active}, 1 up:standby >>> osdmap e2168: 24 osds: 24 up, 24 in >>> flags noscrub,nodeep-scrub,sortbitwise >>> pgmap v3235946: 2240 pgs, 4 pools, 13310 GB data, 4589 kobjects >>> 26650 GB used, 27275 GB / 53926 GB avail >>> 2239 active+clean >>> 1 active+clean+scrubbing+deep >>> client io 0 B/s rd, 490 kB/s wr, 203 op/s rd, 1185 op/s wr >>> cache io 343 MB/s evict, 408 op/s promote, 1 PG(s) evicting, 1 PG(s) >>> evicting (full) >>> >>> ceph osd df >>> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >>> 4 3.63699 1.00000 3724G 1760G 1964G 47.26 0.96 148 >>> 5 3.63699 1.00000 3724G 1830G 1894G 49.14 0.99 158 >>> 6 3.63699 1.00000 3724G 2056G 1667G 55.23 1.12 182 >>> 7 3.63699 1.00000 3724G 1856G 1867G 49.86 1.01 163 >>> 20 2.79199 1.00000 2793G 1134G 1659G 40.60 0.82 98 >>> 21 2.79199 1.00000 2793G 990G 1803G 35.45 0.72 89 >>> 22 2.79199 1.00000 2793G 1597G 1195G 57.20 1.16 134 >>> 23 2.79199 1.00000 2793G 1337G 1455G 47.87 0.97 116 >>> 12 3.63699 1.00000 3724G 1819G 1904G 48.86 0.99 154 >>> 13 3.63699 1.00000 3724G 1681G 2042G 45.16 0.91 144 >>> 14 3.63699 1.00000 3724G 1892G 1832G 50.80 1.03 165 >>> 15 3.63699 1.00000 3724G 1494G 2229G 40.14 0.81 132 >>> 16 2.79199 1.00000 2793G 1375G 1418G 49.23 1.00 121 >>> 17 2.79199 1.00000 2793G 1444G 1348G 51.71 1.05 127 >>> 18 2.79199 1.00000 2793G 1509G 1283G 54.04 1.09 129 >>> 19 2.79199 1.00000 2793G 1345G 1447G 48.19 0.97 116 >>> 0 0.21799 1.00000 223G 180G 44268M 80.65 1.63 269 >>> 1 0.21799 1.00000 223G 201G 22758M 90.05 1.82 303 >>> 2 0.21799 1.00000 223G 182G 42246M 81.54 1.65 284 >>> 3 0.21799 1.00000 223G 200G 23599M 89.69 1.81 296 >>> 8 0.21799 1.00000 223G 177G 46963M 79.48 1.61 272 >>> 9 0.21799 1.00000 223G 203G 20730M 90.94 1.84 307 >>> 10 0.21799 1.00000 223G 190G 34104M 85.10 1.72 288 >>> 11 0.21799 1.00000 223G 193G 31155M 86.38 1.75 285 >>> TOTAL 53926G 26654G 27272G 49.43 >>> MIN/MAX VAR: 0.72/1.84 STDDEV: 21.46 >>> >>> >>> -- >>> Mit freundlichen Gruessen / Best regards >>> >>> Oliver Dzombic >>> IP-Interactive >>> >>> mailto:info@xxxxxxxxxxxxxxxxx >>> >>> Anschrift: >>> >>> IP Interactive UG ( haftungsbeschraenkt ) >>> Zum Sonnenberg 1-3 >>> 63571 Gelnhausen >>> >>> HRB 93402 beim Amtsgericht Hanau >>> Geschäftsführung: Oliver Dzombic >>> >>> Steuer Nr.: 35 236 3622 1 >>> UST ID: DE274086107 >>> >>> >>> Am 26.07.2016 um 04:47 schrieb Gregory Farnum: >>>> On Mon, Jul 25, 2016 at 7:38 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> currently some productive stuff is down, because it can not be accessed >>>>> through cephfs. >>>>> >>>>> Client server restart, did not help. >>>>> Cluster restart, did not help. >>>>> >>>>> Only ONE directory inside cephfs has this issue. >>>>> >>>>> All other directories are working fine. >>>> >>>> What's the full output of "ceph -s"? >>>> >>>>> >>>>> >>>>> MDS Server: Kernel 4.5.4 >>>>> client server: Kernel 4.5.4 >>>>> ceph version 10.2.2 >>>>> >>>>> # ceph fs dump >>>>> dumped fsmap epoch 92 >>>>> e92 >>>>> enable_multiple, ever_enabled_multiple: 0,0 >>>>> compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>>>> ranges,3=default file layouts on dirs,4=dir inode in separate >>>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file >>>>> layout v2} >>>>> >>>>> Filesystem 'ceph-gen2' (2) >>>>> fs_name ceph-gen2 >>>>> epoch 92 >>>>> flags 0 >>>>> created 2016-06-11 21:53:02.142649 >>>>> modified 2016-06-14 11:09:16.783356 >>>>> tableserver 0 >>>>> root 0 >>>>> session_timeout 60 >>>>> session_autoclose 300 >>>>> max_file_size 1099511627776 >>>>> last_failure 0 >>>>> last_failure_osd_epoch 2164 >>>>> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>>>> ranges,3=default file layouts on dirs,4=dir inode in separate >>>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file >>>>> layout v2} >>>>> max_mds 1 >>>>> in 0 >>>>> up {0=234109} >>>>> failed >>>>> damaged >>>>> stopped >>>>> data_pools 4 >>>>> metadata_pool 5 >>>>> inline_data disabled >>>>> 234109: 10.0.0.11:6801/22255 'cephmon1' mds.0.89 up:active seq 250 >>>>> >>>>> >>>>> Standby daemons: >>>>> >>>>> 204171: 10.0.0.13:6800/19434 'cephmon3' mds.-1.0 up:standby seq 1 >>>>> >>>>> >>>>> ceph --admin-daemon ceph-mds.cephmon1.asok dump_ops_in_flight >>>>> { >>>>> "ops": [ >>>>> { >>>>> "description": "client_request(client.204153:432 getattr >>>>> pAsLsXsFs #10000001432 2016-07-25 21:57:30.697894 RETRY=2)", >>>>> "initiated_at": "2016-07-26 04:24:05.528832", >>>>> "age": 816.092461, >>>>> "duration": 816.092528, >>>>> "type_data": [ >>>>> "failed to rdlock, waiting", >>>>> "client.204153:432", >>>>> "client_request", >>>>> { >>>>> "client": "client.204153", >>>>> "tid": 432 >>>>> }, >>>>> [ >>>>> { >>>>> "time": "2016-07-26 04:24:05.528832", >>>>> "event": "initiated" >>>>> }, >>>>> { >>>>> "time": "2016-07-26 04:24:07.613779", >>>>> "event": "failed to rdlock, waiting" >>>>> } >>>>> ] >>>>> ] >>>>> } >>>>> ], >>>>> "num_ops": 1 >>>>> } >>>>> >>>>> >>>>> 2016-07-26 04:32:09.355503 7ffb331ca700 0 log_channel(cluster) log >>>>> [WRN] : 1 slow requests, 1 included below; oldest blocked for > >>>>> 483.826590 secs >>>>> >>>>> 2016-07-26 04:32:09.355531 7ffb331ca700 0 log_channel(cluster) log >>>>> [WRN] : slow request 483.826590 seconds old, received at 2016-07-26 >>>>> 04:24:05.528832: client_request(client.204153:432 getattr pAsLsXsFs >>>>> #10000001432 2016-07-25 21:57:30.697894 RETRY=2) currently failed to >>>>> rdlock, waiting >>>>> >>>>> >>>>> Any idea ? :( >>>>> >>>>> -- >>>>> Mit freundlichen Gruessen / Best regards >>>>> >>>>> Oliver Dzombic >>>>> IP-Interactive >>>>> >>>>> mailto:info@xxxxxxxxxxxxxxxxx >>>>> >>>>> Anschrift: >>>>> >>>>> IP Interactive UG ( haftungsbeschraenkt ) >>>>> Zum Sonnenberg 1-3 >>>>> 63571 Gelnhausen >>>>> >>>>> HRB 93402 beim Amtsgericht Hanau >>>>> Geschäftsführung: Oliver Dzombic >>>>> >>>>> Steuer Nr.: 35 236 3622 1 >>>>> UST ID: DE274086107 >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com