John, Updating to the latest mainline kernel from elrepo (4.8.2-1) on all 4 ceph servers, and the ceph client that I am testing with, still didn’t fix the issues. Still getting “Failing to respond to Cache Pressure”. And ops block currently hovering between 100-300 requests > 32 sec This is just from doing an rsync from one ceph client (reading data from an old san over nfs, and writing to ceph cluster on infiniband I guess I’ll try getting rid of the EC pool and the cache tier, and just using replication with size 3 and see if it works better Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: John Spray<mailto:jspray@xxxxxxxxxx> Sent: Wednesday, October 19, 2016 12:16 PM To: Jim Kilborn<mailto:jim@xxxxxxxxxxxx> Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> Subject: Re: New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size On Wed, Oct 19, 2016 at 5:17 PM, Jim Kilborn <jim@xxxxxxxxxxxx> wrote: > John, > > > > Thanks for the tips…. > > Unfortunately, I was looking at this page http://docs.ceph.com/docs/jewel/start/os-recommendations/ OK, thanks - I've pushed an update to clarify that (https://github.com/ceph/ceph/pull/11564). > I’ll consider either upgrading the kernels or using the fuse client, but will likely go the kernel 4.4 route > > > > As for moving to just a replicated pool, I take it that a replication size of 3 is minimum recommended. > > If I move to no EC, I will have to have have 9 4TB spinners on of the 4 servers. Can I put the 9 journals on the one 128GB ssd with 10GB per journal, or is that two many osds per journal, creating a hot spot for writes? That sounds like a lot of journals on one SSD, but people other than me have more empirical experience in hardware selection. John > > > > Thanks!! > > > > > > > > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 > > > > From: John Spray<mailto:jspray@xxxxxxxxxx> > Sent: Wednesday, October 19, 2016 9:10 AM > To: Jim Kilborn<mailto:jim@xxxxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> > Subject: Re: New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size > > > > On Wed, Oct 19, 2016 at 1:28 PM, Jim Kilborn <jim@xxxxxxxxxxxx> wrote: >> I have setup a new linux cluster to allow migration from our old SAN based cluster to a new cluster with ceph. >> All systems running centos 7.2 with the 3.10.0-327.36.1 kernel. >> I am basically running stock ceph settings, with just turning the write cache off via hdparm on the drives, and temporarily turning of scrubbing. >> >> The 4 ceph servers are all Dell 730XD with 128GB memory, and dual xeon. So Server performance should be good. Since I am running cephfs, I have tiering setup. >> Each server has 4 – 4TB drives for the erasure code pool, with K=3 and M=1. So the idea is to ensure a single host failure. >> Each server also has a 1TB Seagate 850 Pro SSD for the cache drive, in a replicated set with size=2 >> The cache tier also has a 128GB SM863 SSD that is being used as a journal for the cache SSD. It has power loss protection >> My crush map is setup to ensure the cache pool uses only the 4 850 pro and the erasure code uses only the 16 spinning 4TB drives. >> >> The problems that I am seeing is that I start copying data from our old san to the ceph volume, and once the cache tier gets to my target_max_bytes of 1.4 TB, I start seeing: >> >> HEALTH_WARN 63 requests are blocked > 32 sec; 1 osds have slow requests; noout,noscrub,nodeep-scrub,sortbitwise flag(s) set >> 26 ops are blocked > 65.536 sec on osd.0 >> 37 ops are blocked > 32.768 sec on osd.0 >> 1 osds have slow requests >> noout,noscrub,nodeep-scrub,sortbitwise flag(s) set >> >> osd.0 is the cache ssd >> >> If I watch iostat on the cache ssd, I see the queue lengths are high and the await are high >> Below is the iostat on the cache drive (osd.0) on the first host. The avgqu-sz is between 87 and 182 and the await is between 88ms and 1193ms >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >> sdb >> 0.00 0.33 9.00 84.33 0.96 20.11 462.40 75.92 397.56 125.67 426.58 10.70 99.90 >> 0.00 0.67 30.00 87.33 5.96 21.03 471.20 67.86 910.95 87.00 1193.99 8.27 97.07 >> 0.00 16.67 33.00 289.33 4.21 18.80 146.20 29.83 88.99 93.91 88.43 3.10 99.83 >> 0.00 7.33 7.67 261.67 1.92 19.63 163.81 117.42 331.97 182.04 336.36 3.71 100.00 >> >> >> If I look at the iostat for all the drives, only the cache ssd drive is backed up >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >> Sdg (journal for cache drive) >> 0.00 6.33 0.00 8.00 0.00 0.07 19.04 0.00 0.33 0.00 0.33 0.33 0.27 >> Sdb (cache drive) >> 0.00 0.33 3.33 82.00 0.83 20.07 501.68 106.75 1057.81 269.40 1089.86 11.72 100.00 >> Sda (4TB EC) >> 0.00 0.00 0.00 4.00 0.00 0.02 9.33 0.00 0.00 0.00 0.00 0.00 0.00 >> Sdd (4TB EC) >> 0.00 0.00 0.00 2.33 0.00 0.45 392.00 0.08 34.00 0.00 34.00 6.86 1.60 >> Sdf (4TB EC) >> 0.00 14.00 0.00 26.00 0.00 0.22 17.71 1.00 38.55 0.00 38.55 0.68 1.77 >> Sdc (4TB EC) >> 0.00 0.00 0.00 1.33 0.00 0.01 8.75 0.02 12.25 0.00 12.25 12.25 1.63 >> >> While at this time is just complaining about slow osd.0, sometimes the other cache tier ssds show some slow response, but not as frequently. >> >> >> I occasionally see complaints about a client not responding to cache pressure, and yesterday while copying serveral terabytes, the client doing the copy was noted for failing to respond to capability release, and I ended up rebooting it. >> >> I just seems the cluster isn’t handling large amounts of data copies, like and nfs or san based volume would, and I am worried about moving our users to a cluster that already is showing signs of performance issues, even when I am just doing a copy with no other users. I am doing only one rsync at a time. >> >> Is the problem that I need to user a later kernel for the clients mounting the volume ? I have read some posts about that, but the docs say centos 7 with 3.10 is ok. > > Which docs say to use the stock centos kernel? >4.4 is recommended > here: http://docs.ceph.com/docs/master/cephfs/best-practices/ > >> Do I need more drives in my cache pool? I only have 4 ssd drive in the cache pool (one on each host), with each having a separate journal drive. >> But is that too much of a hot spot since all i/o has to go to the cache layer? >> It seems like my ssds should be able to keep up with a single rsync copy. >> Is there something set wrong on my ssds that they cant keep up? > > Cache tiering is pretty sensitive to tuning of various parameters, and > does not necessarily work well out of the box. We do not test CephFS > with cache tiering. CephFS also has some behaviours that don't mesh > very nicely with cache tiering, like periodically doing small writes > to xattrs that would cause an entire data object to be promoted. > >> I put the metadata pool on the ssd cache tier drives as well. > > Hmm, seems like a dubious choice: all data operations are going > through those SSDs, *plus* the operations to promote/flush data to the > spinners -- we do not do any prioritisation internally so your > metadata ops are potentially going to be quite impacted by that. > >> Any ideas where the problem is or what I need to change to make this stable? > > Mainly I would suggest that you try changing some stuff and see what > effect it has: > * Trying just running a replicated pool on spinners and use your SSDs > as journals (or use them for metadata only) -- obviously this does not > get you erasure coding, but it would narrow down where your issues are > coming from. > * Try a more recent kernel (>4.4) > > If you want to take a longer view of this, it might make sense for you > to use replicated storage (and no cache tier) for now, and by the time > your data has grown the Luminous release could be out with support for > using EC without a cache tier. > > John > >> >> >> Thanks. Additional details below >> >> The ceph osd tier drives are osd 0, 5, 10, 15 >> >> ceph df >> GLOBAL: >> SIZE AVAIL RAW USED %RAW USED >> 63155G 50960G 12195G 19.31 >> POOLS: >> NAME ID USED %USED MAX AVAIL OBJECTS >> cephfs-metadata 15 58943k 0 523G 28436 >> cephfs-data 16 7382G 15.59 36499G 5013128 >> cephfs-cache 17 1124G 3.56 523G 298348 >> >> ceph osd df >> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> 0 1.00000 1.00000 931G 669G 261G 71.88 3.72 280 >> 5 1.00000 1.00000 931G 526G 404G 56.51 2.93 259 >> 10 1.00000 1.00000 931G 589G 341G 63.35 3.28 266 >> 15 1.00000 1.00000 931G 468G 462G 50.33 2.61 219 >> 1 1.00000 1.00000 3714G 672G 3041G 18.11 0.94 139 >> 2 1.00000 1.00000 3714G 593G 3120G 15.98 0.83 122 >> 3 1.00000 1.00000 3714G 585G 3128G 15.77 0.82 121 >> 4 1.00000 1.00000 3714G 632G 3081G 17.04 0.88 130 >> 6 1.00000 1.00000 3714G 589G 3124G 15.87 0.82 120 >> 7 1.00000 1.00000 3714G 627G 3087G 16.89 0.87 130 >> 8 1.00000 1.00000 3714G 638G 3076G 17.18 0.89 132 >> 9 1.00000 1.00000 3714G 630G 3083G 16.98 0.88 130 >> 11 1.00000 1.00000 3714G 646G 3067G 17.41 0.90 132 >> 12 1.00000 1.00000 3714G 580G 3133G 15.63 0.81 120 >> 13 1.00000 1.00000 3714G 624G 3089G 16.82 0.87 129 >> 14 1.00000 1.00000 3714G 633G 3081G 17.05 0.88 131 >> 16 1.00000 1.00000 3714G 589G 3124G 15.88 0.82 122 >> 17 1.00000 1.00000 3714G 624G 3090G 16.80 0.87 129 >> 18 1.00000 1.00000 3714G 614G 3099G 16.55 0.86 127 >> 19 1.00000 1.00000 3714G 656G 3057G 17.68 0.92 134 >> 15 1.00000 1.00000 931G 468G 462G 50.33 2.61 219 >> 16 1.00000 1.00000 3714G 589G 3124G 15.88 0.82 122 >> 17 1.00000 1.00000 3714G 624G 3090G 16.80 0.87 129 >> 18 1.00000 1.00000 3714G 614G 3099G 16.55 0.86 127 >> 19 1.00000 1.00000 3714G 656G 3057G 17.68 0.92 134 >> 10 1.00000 1.00000 931G 589G 341G 63.35 3.28 266 >> 11 1.00000 1.00000 3714G 646G 3067G 17.41 0.90 132 >> 12 1.00000 1.00000 3714G 580G 3133G 15.63 0.81 120 >> 13 1.00000 1.00000 3714G 624G 3089G 16.82 0.87 129 >> 14 1.00000 1.00000 3714G 633G 3081G 17.05 0.88 131 >> 5 1.00000 1.00000 931G 526G 404G 56.51 2.93 259 >> 6 1.00000 1.00000 3714G 589G 3124G 15.87 0.82 120 >> 7 1.00000 1.00000 3714G 627G 3087G 16.89 0.87 130 >> 8 1.00000 1.00000 3714G 638G 3076G 17.18 0.89 132 >> 9 1.00000 1.00000 3714G 630G 3083G 16.98 0.88 130 >> 0 1.00000 1.00000 931G 669G 261G 71.88 3.72 280 >> 1 1.00000 1.00000 3714G 672G 3041G 18.11 0.94 139 >> 2 1.00000 1.00000 3714G 593G 3120G 15.98 0.83 122 >> 3 1.00000 1.00000 3714G 585G 3128G 15.77 0.82 121 >> 4 1.00000 1.00000 3714G 632G 3081G 17.04 0.88 130 >> TOTAL 63155G 12195G 50959G 19.31 >> >> ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -14 4.00000 root ssd >> -1 1.00000 disktype darkjedi-ceph01_ssd >> 0 1.00000 osd.0 up 1.00000 1.00000 >> -4 1.00000 disktype darkjedi-ceph02_ssd >> 5 1.00000 osd.5 up 1.00000 1.00000 >> -7 1.00000 disktype darkjedi-ceph03_ssd >> 10 1.00000 osd.10 up 1.00000 1.00000 >> -10 1.00000 disktype darkjedi-ceph04_ssd >> 15 1.00000 osd.15 up 1.00000 1.00000 >> -13 16.00000 root spinning >> -2 4.00000 disktype darkjedi-ceph01_spinning >> 1 1.00000 osd.1 up 1.00000 1.00000 >> 2 1.00000 osd.2 up 1.00000 1.00000 >> 3 1.00000 osd.3 up 1.00000 1.00000 >> 4 1.00000 osd.4 up 1.00000 1.00000 >> -5 4.00000 disktype darkjedi-ceph02_spinning >> 6 1.00000 osd.6 up 1.00000 1.00000 >> 7 1.00000 osd.7 up 1.00000 1.00000 >> 8 1.00000 osd.8 up 1.00000 1.00000 >> 9 1.00000 osd.9 up 1.00000 1.00000 >> -8 4.00000 disktype darkjedi-ceph03_spinning >> 11 1.00000 osd.11 up 1.00000 1.00000 >> 12 1.00000 osd.12 up 1.00000 1.00000 >> 13 1.00000 osd.13 up 1.00000 1.00000 >> 14 1.00000 osd.14 up 1.00000 1.00000 >> -11 4.00000 disktype darkjedi-ceph04_spinning >> 16 1.00000 osd.16 up 1.00000 1.00000 >> 17 1.00000 osd.17 up 1.00000 1.00000 >> 18 1.00000 osd.18 up 1.00000 1.00000 >> 19 1.00000 osd.19 up 1.00000 1.00000 >> -12 5.00000 host darkjedi-ceph04 >> -10 1.00000 disktype darkjedi-ceph04_ssd >> 15 1.00000 osd.15 up 1.00000 1.00000 >> -11 4.00000 disktype darkjedi-ceph04_spinning >> 16 1.00000 osd.16 up 1.00000 1.00000 >> 17 1.00000 osd.17 up 1.00000 1.00000 >> 18 1.00000 osd.18 up 1.00000 1.00000 >> 19 1.00000 osd.19 up 1.00000 1.00000 >> -9 5.00000 host darkjedi-ceph03 >> -7 1.00000 disktype darkjedi-ceph03_ssd >> 10 1.00000 osd.10 up 1.00000 1.00000 >> -8 4.00000 disktype darkjedi-ceph03_spinning >> 11 1.00000 osd.11 up 1.00000 1.00000 >> 12 1.00000 osd.12 up 1.00000 1.00000 >> 13 1.00000 osd.13 up 1.00000 1.00000 >> 14 1.00000 osd.14 up 1.00000 1.00000 >> -6 6.00000 host darkjedi-ceph02 >> -4 2.00000 disktype darkjedi-ceph02_ssd >> 5 1.00000 osd.5 up 1.00000 1.00000 >> -5 4.00000 disktype darkjedi-ceph02_spinning >> 6 1.00000 osd.6 up 1.00000 1.00000 >> 7 1.00000 osd.7 up 1.00000 1.00000 >> 8 1.00000 osd.8 up 1.00000 1.00000 >> 9 1.00000 osd.9 up 1.00000 1.00000 >> -3 5.00000 host darkjedi-ceph01 >> -1 1.00000 disktype darkjedi-ceph01_ssd >> 0 1.00000 osd.0 up 1.00000 1.00000 >> -2 4.00000 disktype darkjedi-ceph01_spinning >> 1 1.00000 osd.1 up 1.00000 1.00000 >> 2 1.00000 osd.2 up 1.00000 1.00000 >> 3 1.00000 osd.3 up 1.00000 1.00000 >> 4 1.00000 osd.4 up 1.00000 1.00000 >> >> >> ceph -w >> cluster 62ed97d6-adf4-12e4-8fd5-3d9701b22b86 >> health HEALTH_WARN >> 72 requests are blocked > 32 sec >> noout,noscrub,nodeep-scrub,sortbitwise flag(s) set >> monmap e2: 3 mons at {darkjedi-ceph01=192.168.19.241:6789/0,darkjedi-ceph02=192.168.19.242:6789/0,darkjedi-ceph03=192.168.19.243:6789/0} >> election epoch 120, quorum 0,1,2 darkjedi-ceph01,darkjedi-ceph02,darkjedi-ceph03 >> fsmap e1156: 1/1/1 up {0=darkjedi-ceph04=up:active}, 1 up:standby >> osdmap e4294: 20 osds: 20 up, 20 in >> flags noout,noscrub,nodeep-scrub,sortbitwise >> pgmap v674643: 1024 pgs, 3 pools, 8513 GB data, 5216 kobjects >> 12204 GB used, 50950 GB / 63155 GB avail >> 1024 active+clean >> client io 4039 kB/s wr, 0 op/s rd, 4 op/s wr >> cache io 10080 kB/s flush, 8064 kB/s evict, 0 op/s promote, 1 PG(s) evicting >> >> 2016-10-19 07:27:42.063881 mon.0 [INF] pgmap v674642: 1024 pgs: 1024 active+clean; 8513 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 10093 kB/s wr, 4 op/s >> 2016-10-19 07:27:38.539687 osd.0 [WRN] 47 slow requests, 1 included below; oldest blocked for > 87.453888 secs >> 2016-10-19 07:27:38.539695 osd.0 [WRN] slow request 30.771983 seconds old, received at 2016-10-19 07:27:07.767636: osd_op(client.1234553.1:412446 17.6f7367c 10000323d95.00000041 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:39.539883 osd.0 [WRN] 48 slow requests, 2 included below; oldest blocked for > 88.454076 secs >> 2016-10-19 07:27:39.539890 osd.0 [WRN] slow request 60.016825 seconds old, received at 2016-10-19 07:26:39.522982: osd_op(client.1234553.1:412252 17.1220dbd9 10000323d93.0000001b [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:39.539897 osd.0 [WRN] slow request 30.063037 seconds old, received at 2016-10-19 07:27:09.476770: osd_op(client.1234553.1:412458 17.dac4885c 10000323d95.0000004d [write 0~2260992 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:40.540136 osd.0 [WRN] 53 slow requests, 5 included below; oldest blocked for > 89.454282 secs >> 2016-10-19 07:27:40.540148 osd.0 [WRN] slow request 30.426319 seconds old, received at 2016-10-19 07:27:10.113694: osd_op(client.1234553.1:412483 17.dd3448b3 10000323d96.00000018 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:40.540153 osd.0 [WRN] slow request 30.557129 seconds old, received at 2016-10-19 07:27:09.982884: osd_op(client.1234553.1:412471 17.6dae7fe6 10000323d96.0000000c [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:40.540160 osd.0 [WRN] slow request 60.947034 seconds old, received at 2016-10-19 07:26:39.592979: osd_op(client.1234553.1:412259 17.5fecfad9 10000323d93.00000022 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:40.540165 osd.0 [WRN] slow request 30.218226 seconds old, received at 2016-10-19 07:27:10.321787: osd_op(client.1234553.1:412487 17.7e783a0b 10000323d96.0000001c [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:40.540173 osd.0 [WRN] slow request 30.669424 seconds old, received at 2016-10-19 07:27:09.870589: osd_op(client.1234553.1:412460 17.242c68e6 10000323d96.00000001 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:41.540415 osd.0 [WRN] 56 slow requests, 4 included below; oldest blocked for > 90.454559 secs >> 2016-10-19 07:27:41.540426 osd.0 [WRN] slow request 30.591025 seconds old, received at 2016-10-19 07:27:10.949264: osd_op(client.1234553.1:412490 17.f00571c7 10000323d96.0000001f [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:41.540431 osd.0 [WRN] slow request 30.584468 seconds old, received at 2016-10-19 07:27:10.955822: osd_op(client.1234553.1:412493 17.e5c93438 10000323d96.00000022 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:41.540438 osd.0 [WRN] slow request 30.570790 seconds old, received at 2016-10-19 07:27:10.969499: osd_op(client.1234553.1:412499 17.75f41c27 10000323d96.00000028 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:41.540445 osd.0 [WRN] slow request 31.588275 seconds old, received at 2016-10-19 07:27:09.952014: osd_op(client.1234553.1:412463 17.e66e96b6 10000323d96.00000004 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:42.540685 osd.0 [WRN] 56 slow requests, 4 included below; oldest blocked for > 91.454840 secs >> 2016-10-19 07:27:42.540694 osd.0 [WRN] slow request 60.595040 seconds old, received at 2016-10-19 07:26:41.945531: osd_op(client.1234553.1:412276 17.cae2b8f3 10000323d93.00000033 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:42.540702 osd.0 [WRN] slow request 30.032041 seconds old, received at 2016-10-19 07:27:12.508530: osd_op(client.1234553.1:412521 17.ec0d7938 10000323d96.0000003e [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:42.540716 osd.0 [WRN] slow request 60.589088 seconds old, received at 2016-10-19 07:26:41.951483: osd_op(client.1234553.1:412277 17.10107d9a 10000323d93.00000034 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently waiting for blocked object >> 2016-10-19 07:27:42.540726 osd.0 [WRN] slow request 60.573651 seconds old, received at 2016-10-19 07:26:41.966920: osd_op(client.1234553.1:412287 17.cac5d907 10000323d93.0000003e [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:43.073014 mon.0 [INF] pgmap v674643: 1024 pgs: 1024 active+clean; 8513 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 4039 kB/s wr, 4 op/s >> 2016-10-19 07:27:43.147883 mon.0 [INF] HEALTH_WARN; 72 requests are blocked > 32 sec; noout,noscrub,nodeep-scrub,sortbitwise flag(s) set >> 2016-10-19 07:27:43.540977 osd.0 [WRN] 60 slow requests, 5 included below; oldest blocked for > 92.455114 secs >> 2016-10-19 07:27:43.540987 osd.0 [WRN] slow request 30.946896 seconds old, received at 2016-10-19 07:27:12.593949: osd_op(client.1234553.1:412553 17.780b5c32 10000323d97.00000010 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:43.540992 osd.0 [WRN] slow request 30.940430 seconds old, received at 2016-10-19 07:27:12.600414: osd_op(client.1234553.1:412554 17.62f5a288 10000323d97.00000011 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:43.541002 osd.0 [WRN] slow request 30.930237 seconds old, received at 2016-10-19 07:27:12.610608: osd_op(client.1234553.1:412557 17.e6fe4157 10000323d97.00000014 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:43.541009 osd.0 [WRN] slow request 30.669228 seconds old, received at 2016-10-19 07:27:12.871617: osd_op(client.1234553.1:412568 17.dd1ccfe0 10000323d97.0000001f [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:43.541018 osd.0 [WRN] slow request 60.284696 seconds old, received at 2016-10-19 07:26:43.256148: osd_op(client.1234553.1:412302 17.dcfdcaad 10000323d93.0000004d [write 0~2260992 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:44.083654 mon.0 [INF] pgmap v674644: 1024 pgs: 1024 active+clean; 8513 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 4059 kB/s wr, 2 op/s >> 2016-10-19 07:27:45.105055 mon.0 [INF] pgmap v674645: 1024 pgs: 1024 active+clean; 8513 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 26360 kB/s wr, 6 op/s >> 2016-10-19 07:27:46.116200 mon.0 [INF] pgmap v674646: 1024 pgs: 1024 active+clean; 8514 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 90603 kB/s wr, 23 op/s >> 2016-10-19 07:27:47.126401 mon.0 [INF] pgmap v674647: 1024 pgs: 1024 active+clean; 8514 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 114 MB/s wr, 38 op/s >> 2016-10-19 07:27:48.145239 mon.0 [INF] pgmap v674648: 1024 pgs: 1024 active+clean; 8514 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 100 MB/s wr, 49 op/s >> 2016-10-19 07:27:49.153554 mon.0 [INF] pgmap v674649: 1024 pgs: 1024 active+clean; 8514 GB data, 12204 GB used, 50950 GB / 63155 GB avail; 64589 kB/s wr, 30 op/s >> 2016-10-19 07:27:50.158547 mon.0 [INF] pgmap v674650: 1024 pgs: 1024 active+clean; 8514 GB data, 12205 GB used, 50950 GB / 63155 GB avail; 34499 kB/s wr, 8 op/s >> 2016-10-19 07:27:51.175699 mon.0 [INF] pgmap v674651: 1024 pgs: 1024 active+clean; 8514 GB data, 12205 GB used, 50949 GB / 63155 GB avail; 79100 kB/s wr, 19 op/s >> 2016-10-19 07:27:52.180504 mon.0 [INF] pgmap v674652: 1024 pgs: 1024 active+clean; 8514 GB data, 12205 GB used, 50950 GB / 63155 GB avail; 95241 kB/s wr, 71 op/s >> 2016-10-19 07:27:44.541269 osd.0 [WRN] 59 slow requests, 3 included below; oldest blocked for > 93.455423 secs >> 2016-10-19 07:27:44.541276 osd.0 [WRN] slow request 31.985270 seconds old, received at 2016-10-19 07:27:12.555884: osd_op(client.1234553.1:412536 17.30639c12 10000323d96.0000004d [write 0~2260992 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:44.541282 osd.0 [WRN] slow request 31.980479 seconds old, received at 2016-10-19 07:27:12.560674: osd_op(client.1234553.1:412537 17.5c9321a8 10000323d97.00000000 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:44.541287 osd.0 [WRN] slow request 31.954483 seconds old, received at 2016-10-19 07:27:12.586670: osd_op(client.1234553.1:412545 17.49eb1b07 10000323d97.00000008 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:50.542250 osd.0 [WRN] 53 slow requests, 1 included below; oldest blocked for > 99.456434 secs >> 2016-10-19 07:27:50.542261 osd.0 [WRN] slow request 30.322487 seconds old, received at 2016-10-19 07:27:20.219678: osd_op(client.1234553.1:412574 17.6c703de 10000323d97.00000025 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:51.542498 osd.0 [WRN] 50 slow requests, 3 included below; oldest blocked for > 100.456658 secs >> 2016-10-19 07:27:51.542505 osd.0 [WRN] slow request 60.740689 seconds old, received at 2016-10-19 07:26:50.801700: osd_op(client.1234553.1:412320 17.802362e0 10000323d94.00000011 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:51.542513 osd.0 [WRN] slow request 60.574101 seconds old, received at 2016-10-19 07:26:50.968288: osd_op(client.1234553.1:412321 17.c529e2e6 10000323d94.00000012 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:51.542518 osd.0 [WRN] slow request 30.452691 seconds old, received at 2016-10-19 07:27:21.089698: osd_op(client.1234553.1:412576 17.65178e57 10000323d97.00000027 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:52.542740 osd.0 [WRN] 46 slow requests, 4 included below; oldest blocked for > 101.456908 secs >> 2016-10-19 07:27:52.542746 osd.0 [WRN] slow request 30.674005 seconds old, received at 2016-10-19 07:27:21.868634: osd_op(client.1234553.1:412582 17.722c7ade 10000323d97.0000002d [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:52.542757 osd.0 [WRN] slow request 60.439945 seconds old, received at 2016-10-19 07:26:52.102693: osd_op(client.1234553.1:412333 17.f8df72c7 10000323d94.0000001e [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:52.542763 osd.0 [WRN] slow request 30.875550 seconds old, received at 2016-10-19 07:27:21.667089: osd_op(client.1234553.1:412581 17.13ef9fd2 10000323d97.0000002c [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:52.542770 osd.0 [WRN] slow request 60.317892 seconds old, received at 2016-10-19 07:26:52.224747: osd_op(client.1234553.1:412338 17.7416d4bb 10000323d94.00000023 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently waiting for blocked object >> 2016-10-19 07:27:53.209272 mon.0 [INF] pgmap v674653: 1024 pgs: 1024 active+clean; 8514 GB data, 12205 GB used, 50949 GB / 63155 GB avail; 73115 kB/s wr, 75 op/s >> 2016-10-19 07:27:53.542922 osd.0 [WRN] 45 slow requests, 1 included below; oldest blocked for > 98.523012 secs >> 2016-10-19 07:27:53.542934 osd.0 [WRN] slow request 30.691735 seconds old, received at 2016-10-19 07:27:22.851117: osd_op(client.1234553.1:412586 17.905a3e43 10000323d97.00000031 [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e4294) currently reached_pg >> 2016-10-19 07:27:54.219455 mon.0 [INF] pgmap v674654: 1024 pgs: 1024 active+clean; 8514 GB data, 12205 GB used, 50949 GB / 63155 GB avail; 50465 kB/s wr, 21 op/s >> >> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com