Hi Igor, thanks for your answer. I was thinking about that, but as far as I understood, to hit this bug actually requires a partial rewrite to happen. However, these are disk images in storage servers with basically static files, many of which very large (15GB). Therefore, I believe, the vast majority of objects is written to only once and should not be affected by the amplification bug. Is there any way to confirm/rule out that/check how much amplification is happening? I'm wondering if I might be observing something else. Since "ceph osd df tree" does report the actual utilization and I have only one pool on these OSDs, there is no problem with accounting allocated storage to a pool. I know its all used by this one pool. I'm more wondering if its not the known amplification but something else (at least partly) that plays a role here. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Igor Fedotov <ifedotov@xxxxxxx> Sent: 27 July 2020 12:54:02 To: Frank Schilder; ceph-users Subject: Re: mimic: much more raw used than reported Hi Frank, you might be being hit by https://tracker.ceph.com/issues/44213 In short the root causes are significant space overhead due to high bluestore allocation unit (64K) and EC overwrite design. This is fixed for upcoming Pacific release by using 4K alloc unit but it is unlikely to be backported to earlier releases due to its complexity. To say nothing about the need for OSD redeployment. Hence please expect no fix for mimic. And your raw usage reports might still be not that good since mimic lacks per-pool stats collection https://github.com/ceph/ceph/pull/19454. I.e. your actual raw space usage is higher than reported. To estimate proper raw usage one can use bluestore perf counters (namely bluestore_stored and bluestore_allocated). Summing bluestore_allocated over all involved OSDs will give actual RAW usage. Summing bluestore_stored will provide actual data volume after EC processing, i.e. presumably it should be around 158TiB. Thanks, Igor On 7/26/2020 8:43 PM, Frank Schilder wrote: > Dear fellow cephers, > > I observe a wired problem on our mimic-13.2.8 cluster. We have an EC RBD pool backed by HDDs. These disks are not in any other pool. I noticed that the total capacity (=USED+MAX AVAIL) reported by "ceph df detail" has shrunk recently from 300TiB to 200TiB. Part but by no means all of this can be explained by imbalance of the data distribution. > > When I compare the output of "ceph df detail" and "ceph osd df tree", I find 69TiB raw capacity used but not accounted for; see calculations below. These 69TiB raw are equivalent to 20% usable capacity and I really need it back. Together with the imbalance, we loose about 30% capacity. > > What is using these extra 69TiB and how can I get it back? > > > Some findings: > > These are the 5 largest images in the pool, accounting for a total of 97TiB out of 119TiB usage: > > # rbd du : > NAME PROVISIONED USED > one-133 25 TiB 14 TiB > NAME PROVISIONED USED > one-153@222 40 TiB 14 TiB > one-153@228 40 TiB 357 GiB > one-153@235 40 TiB 797 GiB > one-153@241 40 TiB 509 GiB > one-153@242 40 TiB 43 GiB > one-153@243 40 TiB 16 MiB > one-153@244 40 TiB 16 MiB > one-153@245 40 TiB 324 MiB > one-153@246 40 TiB 276 MiB > one-153@247 40 TiB 96 MiB > one-153@248 40 TiB 138 GiB > one-153@249 40 TiB 1.8 GiB > one-153@250 40 TiB 0 B > one-153 40 TiB 204 MiB > <TOTAL> 40 TiB 16 TiB > NAME PROVISIONED USED > one-391@3 40 TiB 432 MiB > one-391@9 40 TiB 26 GiB > one-391@15 40 TiB 90 GiB > one-391@16 40 TiB 0 B > one-391@17 40 TiB 0 B > one-391@18 40 TiB 0 B > one-391@19 40 TiB 0 B > one-391@20 40 TiB 3.5 TiB > one-391@21 40 TiB 5.4 TiB > one-391@22 40 TiB 5.8 TiB > one-391@23 40 TiB 8.4 TiB > one-391@24 40 TiB 1.4 TiB > one-391 40 TiB 2.2 TiB > <TOTAL> 40 TiB 27 TiB > NAME PROVISIONED USED > one-394@3 70 TiB 1.4 TiB > one-394@9 70 TiB 2.5 TiB > one-394@15 70 TiB 20 GiB > one-394@16 70 TiB 0 B > one-394@17 70 TiB 0 B > one-394@18 70 TiB 0 B > one-394@19 70 TiB 383 GiB > one-394@20 70 TiB 3.3 TiB > one-394@21 70 TiB 5.0 TiB > one-394@22 70 TiB 5.0 TiB > one-394@23 70 TiB 9.0 TiB > one-394@24 70 TiB 1.6 TiB > one-394 70 TiB 2.5 TiB > <TOTAL> 70 TiB 31 TiB > NAME PROVISIONED USED > one-434 25 TiB 9.1 TiB > > The large 70TiB images one-391 and one-394 are currently copied to with ca. 5TiB per day. > > Output of "ceph df detail" with some columns removed: > > NAME ID USED %USED MAX AVAIL OBJECTS RAW USED > sr-rbd-data-one-hdd 11 119 TiB 58.45 84 TiB 31286554 158 TiB > > Pool is EC 6+2. > USED is correct: 31286554*4MiB=119TiB. > RAW USED is correct: 119*8/6=158TiB. > Most of this data is freshly copied onto large RBD images. > Compression is enabled on this pool (aggressive,snappy). > > However, when looking at "deph osd df tree", I get > > The combined raw capacity of OSDs backing this pool is 406.8TiB (sum over SIZE). > Summing up column USE over all OSDs gives 227.5TiB. > > This gives a difference of 69TiB (=227-158) that is not accounted for. > > Here the output of "ceph osd df tree limited" to the drives backing the pool: > > ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS TYPE NAME > 84 hdd 8.90999 1.00000 8.9 TiB 5.0 TiB 5.0 TiB 180 MiB 16 GiB 3.9 TiB 56.43 1.72 103 osd.84 > 145 hdd 8.90999 1.00000 8.9 TiB 4.6 TiB 4.6 TiB 144 MiB 14 GiB 4.3 TiB 51.37 1.57 87 osd.145 > 156 hdd 8.90999 1.00000 8.9 TiB 5.2 TiB 5.1 TiB 173 MiB 16 GiB 3.8 TiB 57.91 1.77 100 osd.156 > 168 hdd 8.90999 1.00000 8.9 TiB 5.0 TiB 5.0 TiB 164 MiB 16 GiB 3.9 TiB 56.31 1.72 98 osd.168 > 181 hdd 8.90999 1.00000 8.9 TiB 5.5 TiB 5.4 TiB 121 MiB 17 GiB 3.5 TiB 61.26 1.87 105 osd.181 > 74 hdd 8.90999 1.00000 8.9 TiB 4.2 TiB 4.2 TiB 148 MiB 13 GiB 4.7 TiB 46.79 1.43 85 osd.74 > 144 hdd 8.90999 1.00000 8.9 TiB 4.7 TiB 4.7 TiB 106 MiB 15 GiB 4.2 TiB 53.17 1.62 94 osd.144 > 157 hdd 8.90999 1.00000 8.9 TiB 5.8 TiB 5.8 TiB 192 MiB 18 GiB 3.1 TiB 65.02 1.99 111 osd.157 > 169 hdd 8.90999 1.00000 8.9 TiB 5.1 TiB 5.1 TiB 172 MiB 16 GiB 3.8 TiB 56.99 1.74 102 osd.169 > 180 hdd 8.90999 1.00000 8.9 TiB 5.8 TiB 5.8 TiB 131 MiB 18 GiB 3.1 TiB 65.04 1.99 111 osd.180 > 60 hdd 8.90999 1.00000 8.9 TiB 4.5 TiB 4.5 TiB 155 MiB 14 GiB 4.4 TiB 50.40 1.54 93 osd.60 > 146 hdd 8.90999 1.00000 8.9 TiB 4.8 TiB 4.8 TiB 139 MiB 15 GiB 4.1 TiB 53.70 1.64 92 osd.146 > 158 hdd 8.90999 1.00000 8.9 TiB 5.6 TiB 5.5 TiB 183 MiB 17 GiB 3.4 TiB 62.30 1.90 109 osd.158 > 170 hdd 8.90999 1.00000 8.9 TiB 5.7 TiB 5.6 TiB 205 MiB 18 GiB 3.2 TiB 63.53 1.94 112 osd.170 > 182 hdd 8.90999 1.00000 8.9 TiB 4.7 TiB 4.6 TiB 105 MiB 14 GiB 4.3 TiB 52.27 1.60 92 osd.182 > 63 hdd 8.90999 1.00000 8.9 TiB 4.7 TiB 4.7 TiB 156 MiB 15 GiB 4.2 TiB 52.74 1.61 98 osd.63 > 148 hdd 8.90999 1.00000 8.9 TiB 5.2 TiB 5.1 TiB 119 MiB 16 GiB 3.8 TiB 57.82 1.77 100 osd.148 > 159 hdd 8.90999 1.00000 8.9 TiB 4.0 TiB 4.0 TiB 89 MiB 12 GiB 4.9 TiB 44.61 1.36 79 osd.159 > 172 hdd 8.90999 1.00000 8.9 TiB 5.1 TiB 5.1 TiB 173 MiB 16 GiB 3.8 TiB 57.22 1.75 98 osd.172 > 183 hdd 8.90999 1.00000 8.9 TiB 6.0 TiB 6.0 TiB 135 MiB 19 GiB 2.9 TiB 67.35 2.06 118 osd.183 > 229 hdd 8.90999 1.00000 8.9 TiB 4.6 TiB 4.6 TiB 127 MiB 15 GiB 4.3 TiB 52.05 1.59 93 osd.229 > 232 hdd 8.90999 1.00000 8.9 TiB 5.2 TiB 5.2 TiB 158 MiB 17 GiB 3.7 TiB 58.22 1.78 101 osd.232 > 235 hdd 8.90999 1.00000 8.9 TiB 4.1 TiB 4.1 TiB 103 MiB 13 GiB 4.8 TiB 45.96 1.40 79 osd.235 > 238 hdd 8.90999 1.00000 8.9 TiB 5.4 TiB 5.4 TiB 120 MiB 17 GiB 3.5 TiB 60.47 1.85 104 osd.238 > 259 hdd 10.91399 1.00000 11 TiB 6.2 TiB 6.2 TiB 140 MiB 19 GiB 4.7 TiB 56.54 1.73 120 osd.259 > 231 hdd 8.90999 1.00000 8.9 TiB 5.1 TiB 5.1 TiB 114 MiB 16 GiB 3.8 TiB 56.90 1.74 101 osd.231 > 233 hdd 8.90999 1.00000 8.9 TiB 5.5 TiB 5.5 TiB 123 MiB 17 GiB 3.4 TiB 61.78 1.89 106 osd.233 > 236 hdd 8.90999 1.00000 8.9 TiB 5.1 TiB 5.1 TiB 114 MiB 16 GiB 3.8 TiB 57.53 1.76 101 osd.236 > 239 hdd 8.90999 1.00000 8.9 TiB 4.2 TiB 4.2 TiB 95 MiB 13 GiB 4.7 TiB 47.41 1.45 86 osd.239 > 263 hdd 10.91399 1.00000 11 TiB 5.3 TiB 5.3 TiB 178 MiB 17 GiB 5.6 TiB 48.73 1.49 102 osd.263 > 228 hdd 8.90999 1.00000 8.9 TiB 5.1 TiB 5.1 TiB 113 MiB 16 GiB 3.8 TiB 57.10 1.74 96 osd.228 > 230 hdd 8.90999 1.00000 8.9 TiB 4.9 TiB 4.9 TiB 144 MiB 16 GiB 4.0 TiB 55.20 1.69 99 osd.230 > 234 hdd 8.90999 1.00000 8.9 TiB 5.6 TiB 5.6 TiB 164 MiB 18 GiB 3.3 TiB 63.29 1.93 109 osd.234 > 237 hdd 8.90999 1.00000 8.9 TiB 4.8 TiB 4.8 TiB 110 MiB 15 GiB 4.1 TiB 54.33 1.66 97 osd.237 > 260 hdd 10.91399 1.00000 11 TiB 5.4 TiB 5.4 TiB 152 MiB 17 GiB 5.5 TiB 49.35 1.51 104 osd.260 > 0 hdd 8.90999 1.00000 8.9 TiB 5.2 TiB 5.2 TiB 157 MiB 16 GiB 3.7 TiB 58.28 1.78 102 osd.0 > 2 hdd 8.90999 1.00000 8.9 TiB 5.3 TiB 5.2 TiB 122 MiB 16 GiB 3.6 TiB 59.05 1.80 106 osd.2 > 72 hdd 8.90999 1.00000 8.9 TiB 4.4 TiB 4.4 TiB 145 MiB 14 GiB 4.5 TiB 49.89 1.52 89 osd.72 > 76 hdd 8.90999 1.00000 8.9 TiB 5.1 TiB 5.1 TiB 178 MiB 16 GiB 3.8 TiB 56.89 1.74 102 osd.76 > 86 hdd 8.90999 1.00000 8.9 TiB 4.6 TiB 4.5 TiB 155 MiB 14 GiB 4.3 TiB 51.18 1.56 94 osd.86 > 1 hdd 8.90999 1.00000 8.9 TiB 4.9 TiB 4.9 TiB 141 MiB 15 GiB 4.0 TiB 54.73 1.67 95 osd.1 > 3 hdd 8.90999 1.00000 8.9 TiB 4.7 TiB 4.7 TiB 156 MiB 15 GiB 4.2 TiB 52.40 1.60 94 osd.3 > 73 hdd 8.90999 1.00000 8.9 TiB 5.0 TiB 4.9 TiB 146 MiB 16 GiB 3.9 TiB 55.68 1.70 102 osd.73 > 85 hdd 8.90999 1.00000 8.9 TiB 5.6 TiB 5.5 TiB 192 MiB 18 GiB 3.3 TiB 62.46 1.91 109 osd.85 > 87 hdd 8.90999 1.00000 8.9 TiB 5.0 TiB 5.0 TiB 189 MiB 16 GiB 3.9 TiB 55.91 1.71 102 osd.87 > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx