On Tue, Aug 30, 2016 at 9:18 PM, Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> wrote: > Just a small typo correction to my previous email. Without it the meaning was completely different:. > > "At this point I just need a way to recover the pg safely and I do NOT see how I can do that since it is impossible to understand what is the problematic osd with the incoherent object." Hi Goncalo, A couple of things. In my last post I got the code wrong, I posted the code for the actual digest, not the omap digest, but they are essentially the same. Here's the code I meant to post. 438 if (auth.omap_digest_present && candidate.omap_digest_present) { 439 if (auth.omap_digest != candidate.omap_digest) { 440 if (error != CLEAN) 441 errorstream << ", "; 442 error = DEEP_ERROR; 443 bool known = auth_oi.is_omap_digest() && 444 auth.omap_digest == auth_oi.omap_digest; 445 errorstream << "omap_digest 0x" << std::hex << candidate.omap_digest 446 << " != " 447 << (known ? "known" : "best guess") 448 << " omap_digest 0x" << auth.omap_digest << std::dec 449 << " from auth shard " << auth_shard; 450 result.set_omap_digest_mismatch(); 451 } 452 } Here is the message from the commit for this code which gives a little more insight. osd: be precise about "known" vs "best guess" We cannot assume that the auth info has the "known" digest; all replicas may have a matching digest that does not match the oi, or we may choose a different auth info for some other reason. Verify that the digest matches the oi before calling it "known". So we can only say that a digest is "known" if it matches the digest stored in the object_info_t, otherwise it is a "best guess". Looking at the pg query that you posted it seems to give some clues as for both peer 49 and 59 it gives the following statistics. "num_objects_degraded": 2 "last_undegraded": "2016-08-25 06:41:22.446581" So it appears to indicate that the two replicas are considered divergent from the primary. Note the last_undegraded date is the same day as "head" was created on the primary. So it looks like the difference in the digest may have happened at the time this OSD became primary and was populated with this pg's data. This *might* (speculation) happen if the primary were running a different version from the replicas when it was introduced into the cluster (inspection of the logs looking for SHA1 version signatures from around this period would confirm or refute this. Look for the "ceph version" line logged at startup or run "ceph daemon /path/to/osd.49.asok version" for each OSD if they have not been restarted). Another possibility is this is a bug in the way we populate the omap data when a new primary comes online but I would consider this less likely (although certainly not impossible). Another possibility is a difference in the way the CRC for the omap digest is being calculated on the peers. > ________________________________________ > From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Goncalo Borges [goncalo.borges@xxxxxxxxxxxxx] > Sent: 30 August 2016 18:53 > To: Brad Hubbard > Cc: ceph-users@xxxxxxxx > Subject: Re: cephfs metadata pool: deep-scrub error "omap_digest != best guess omap_digest" > > > I can run a deep scrub with the log levels mentioned if it safe to do it in an inconsistent pg. I have read somewhere that it shouldn't be done but I do not remember where and how precise is that info. Is it safe to do so? AFAIK the only adverse affects are that it could possibly affect performance and possibly fill up your disk where logs are stored so I would suggest doing it during a quiet period and monitoring disk usage closely. It should only need to be enabled for a relatively short amount of time and should not require a restart so I would think it was fairly safe although I would like to read what you read to understand better what you are referring to. > At this point I just need a way to recover the pg safely and I do see how I can do that since it is impossible to understand what is the problematic osd with the incoherent object. > > I also think I am not the only seeing it. I participated in a discussion last time of exactly the same issue experienced by someone else in jewel I went back to that thread and it appears that was also a metadata pool so, whatever this is, it seems to require a metadata pool for this to happen which is odd but may be related to the volume or way in which those pools use the omap functionality. I think at this stage we really need to get a tracker opened for this and start looking at debug logging and/or ceph-objectstore-tool output. It would also be interesting to get additional information from others affected by this issue and a tracker would be the best place to gather that. -- Cheers, Brad > > Cheers > Goncalo > > > ________________________________________ > From: Brad Hubbard [bhubbard@xxxxxxxxxx] > Sent: 30 August 2016 17:13 > To: Goncalo Borges > Cc: ceph-users@xxxxxxxx > Subject: Re: cephfs metadata pool: deep-scrub error "omap_digest != best guess omap_digest" > > On Tue, Aug 30, 2016 at 3:50 PM, Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> wrote: >> Dear Ceph / CephFS supports. >> >> We are currently running Jewel 10.2.2. >> >> From time to time we experience deep-scrub errors in pgs inside our cephfs >> metadata pool. It is important to note that we do not see any hardware >> errors on the osds themselves so the error must have some other source. > > Can you verify that all your nodes are the same architecture and all running the > same ceph version? This sort of thing has been reported before when running > mismatched versions and/or architectures. > > http://tracker.ceph.com/issues/4743 looks similar but is very old and likely not > relevant. In that case it was recommended to get logs with debug filestore = 20 > debug osd = 30 debug ms = 1 logs from all three replicas of such a PG while > running a deep scrub on it so gathering those may be a good idea as is opening a > new tracker for this. > >> >> The error itself is the following: >> >> # cat /var/log/ceph/ceph.log| grep 5.3d0 >> 2016-08-30 00:30:53.492626 osd.78 192.231.127.171:6828/6072 331 : cluster >> [INF] 5.3d0 deep-scrub starts >> 2016-08-30 00:30:54.276134 osd.78 192.231.127.171:6828/6072 332 : cluster >> [ERR] 5.3d0 shard 78: soid 5:0bd6d154:::602.00000000:head omap_digest >> 0xf3fdfd0c != best guess omap_digest 0x23b2eae0 from auth shard 49 >> 2016-08-30 00:30:54.747795 osd.78 192.231.127.171:6828/6072 333 : cluster >> [ERR] 5.3d0 deep-scrub 0 missing, 1 inconsistent objects >> 2016-08-30 00:30:54.747801 osd.78 192.231.127.171:6828/6072 334 : cluster >> [ERR] 5.3d0 deep-scrub 1 errors > > AFAIU the omap_digest is a calculated CRC32 of omap header & key/values. These > values are stored in the OSD leveldb and not the data directories but I think > Greg already mentioned that last time? > >> >> For us there are a few unknowns on how to recover from this error: >> >> 1) The first issue is that we do really do not understand the nature of the >> error. What does it mean "omap_digest != best guess omap_digest"? it seems >> to point to some problem in the digest of omap contents between the two osds >> but does not tell you exactly what. > > src/osd/PGBackend.cc: > > 423 if (auth.digest_present && candidate.digest_present) { > 424 if (auth.digest != candidate.digest) { > 425 if (error != CLEAN) > 426 errorstream << ", "; > 427 error = DEEP_ERROR; > 428 bool known = auth_oi.is_data_digest() && > 429 auth.digest == auth_oi.data_digest; > 430 errorstream << "data_digest 0x" << std::hex << candidate.digest > 431 << " != " > 432 << (known ? "known" : "best guess") > 433 << " data_digest 0x" << auth.digest << std::dec > 434 << " from auth shard " << auth_shard; > 435 result.set_data_digest_mismatch(); > 436 } > 437 } > > On line 428 "known" is either set or it isn't. With precedence it actually looks > like this. > > bool known = (((auth_oi.is_omap_digest)()) && ((auth.omap_digest) == (auth_oi.omap_digest))); > > auth_oi is an object_info_t so it looks like we are comparing the digest to a > copy we already have stored and if they match the result is considered "known", > otherwise it's considered a "best guess". Hopefully someone can elaborate on > what this means. > >> >> 2) The second issue is that it is really difficult to try to explore >> metadata objects and omap info. While in the data pool we do know how to >> inspect pgs and object contents (and decide what is the problematic osd by >> comparison in a 3 replica setup), in the metadata pool we have to access pg >> contents using 'ceph-objectstore-tool'. For that, we have to stop the osd so >> that the deamon releases the omap lock. Moreover, I have successfully >> imported / exported / listed pgs contents but I was never able to query omap >> contents of objects inside pgs. Maybe I am doing it wrong but I do not find >> the tool helpful at the moment for this precise task. >> >> 3) Finally, I am unsure what is the consequence of running 'pg repair'. In >> my specific case, the primary osd is on a host which was recently added to >> production. Moreover, the dates of the problematic object match for the >> secondary piers which gives me the feeling that the primary osd (78) might >> be the problematic one. I know that in the past, the default behavior was >> simply to copy the pg contents of the primary osd to the others. That can >> lead to data corruption if the problematic osd is indeed the primary, and I >> wonder if in Jewel there is some smarter way to do the pg repair. >> >> [root@server9 ~]# ll >> /var/lib/ceph/osd/ceph-78/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 >> -rw-r--r-- 1 ceph ceph 0 Aug 25 21:41 >> /var/lib/ceph/osd/ceph-78/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 >> >> >> [root@server7 ~]# ll >> /var/lib/ceph/osd/ceph-49/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 >> -rw-r--r-- 1 ceph ceph 0 Jul 27 02:30 >> /var/lib/ceph/osd/ceph-49/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 >> >> >> [root@server8 ~]# ll >> /var/lib/ceph/osd/ceph-59/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 >> -rw-r--r-- 1 ceph ceph 0 Jul 27 02:30 >> /var/lib/ceph/osd/ceph-59/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 >> >> >> A pg query does not seem to give me any information about the problem. The >> query results follows this email. >> >> Any help is appreciated. >> >> Cheers >> Goncalo >> >> --- * --- >> >> >> >> # ceph pg 5.3d0 query >> { >> "state": "active+clean+inconsistent", >> "snap_trimq": "[]", >> "epoch": 23099, >> "up": [ >> 78, >> 59, >> 49 >> ], >> "acting": [ >> 78, >> 59, >> 49 >> ], >> "actingbackfill": [ >> "49", >> "59", >> "78" >> ], >> "info": { >> "pgid": "5.3d0", >> "last_update": "23099'104726", >> "last_complete": "23099'104726", >> "log_tail": "23099'101639", >> "last_user_version": 104726, >> "last_backfill": "MAX", >> "last_backfill_bitwise": 1, >> "purged_snaps": "[]", >> "history": { >> "epoch_created": 339, >> "last_epoch_started": 22440, >> "last_epoch_clean": 22440, >> "last_epoch_split": 0, >> "last_epoch_marked_full": 0, >> "same_up_since": 19928, >> "same_interval_since": 22439, >> "same_primary_since": 22439, >> "last_scrub": "23099'104724", >> "last_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_deep_scrub": "23099'104724", >> "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646" >> }, >> "stats": { >> "version": "23099'104726", >> "reported_seq": "84233", >> "reported_epoch": "23099", >> "state": "active+clean+inconsistent", >> "last_fresh": "2016-08-30 02:40:35.963747", >> "last_change": "2016-08-30 00:30:54.747882", >> "last_active": "2016-08-30 02:40:35.963747", >> "last_peered": "2016-08-30 02:40:35.963747", >> "last_clean": "2016-08-30 02:40:35.963747", >> "last_became_active": "2016-08-25 21:41:30.649369", >> "last_became_peered": "2016-08-25 21:41:30.649369", >> "last_unstale": "2016-08-30 02:40:35.963747", >> "last_undegraded": "2016-08-30 02:40:35.963747", >> "last_fullsized": "2016-08-30 02:40:35.963747", >> "mapping_epoch": 19928, >> "log_start": "23099'101639", >> "ondisk_log_start": "23099'101639", >> "created": 339, >> "last_epoch_clean": 22440, >> "parent": "0.0", >> "parent_split_bits": 0, >> "last_scrub": "23099'104724", >> "last_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_deep_scrub": "23099'104724", >> "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646", >> "log_size": 3087, >> "ondisk_log_size": 3087, >> "stats_invalid": false, >> "dirty_stats_invalid": false, >> "omap_stats_invalid": false, >> "hitset_stats_invalid": false, >> "hitset_bytes_stats_invalid": false, >> "pin_stats_invalid": true, >> "stat_sum": { >> "num_bytes": 0, >> "num_objects": 257, >> "num_object_clones": 0, >> "num_object_copies": 771, >> "num_objects_missing_on_primary": 0, >> "num_objects_missing": 0, >> "num_objects_degraded": 0, >> "num_objects_misplaced": 0, >> "num_objects_unfound": 0, >> "num_objects_dirty": 257, >> "num_whiteouts": 0, >> "num_read": 21865, >> "num_read_kb": 378449, >> "num_write": 106287, >> "num_write_kb": 402800, >> "num_scrub_errors": 1, >> "num_shallow_scrub_errors": 0, >> "num_deep_scrub_errors": 1, >> "num_objects_recovered": 2006, >> "num_bytes_recovered": 0, >> "num_keys_recovered": 124614, >> "num_objects_omap": 257, >> "num_objects_hit_set_archive": 0, >> "num_bytes_hit_set_archive": 0, >> "num_flush": 0, >> "num_flush_kb": 0, >> "num_evict": 0, >> "num_evict_kb": 0, >> "num_promote": 0, >> "num_flush_mode_high": 0, >> "num_flush_mode_low": 0, >> "num_evict_mode_some": 0, >> "num_evict_mode_full": 0, >> "num_objects_pinned": 0 >> }, >> "up": [ >> 78, >> 59, >> 49 >> ], >> "acting": [ >> 78, >> 59, >> 49 >> ], >> "blocked_by": [], >> "up_primary": 78, >> "acting_primary": 78 >> }, >> "empty": 0, >> "dne": 0, >> "incomplete": 0, >> "last_epoch_started": 22440, >> "hit_set_history": { >> "current_last_update": "0'0", >> "history": [] >> } >> }, >> "peer_info": [ >> { >> "peer": "49", >> "pgid": "5.3d0", >> "last_update": "23099'104726", >> "last_complete": "23099'104726", >> "log_tail": "1963'93313", >> "last_user_version": 96444, >> "last_backfill": "MAX", >> "last_backfill_bitwise": 1, >> "purged_snaps": "[]", >> "history": { >> "epoch_created": 339, >> "last_epoch_started": 22440, >> "last_epoch_clean": 22440, >> "last_epoch_split": 0, >> "last_epoch_marked_full": 0, >> "same_up_since": 19928, >> "same_interval_since": 22439, >> "same_primary_since": 22439, >> "last_scrub": "23099'104724", >> "last_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_deep_scrub": "23099'104724", >> "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646" >> }, >> "stats": { >> "version": "20737'96443", >> "reported_seq": "77807", >> "reported_epoch": "22439", >> "state": "active+remapped+wait_backfill", >> "last_fresh": "2016-08-25 06:41:22.446581", >> "last_change": "2016-08-24 23:56:20.865302", >> "last_active": "2016-08-25 06:41:22.446581", >> "last_peered": "2016-08-25 06:41:22.446581", >> "last_clean": "2016-08-24 13:42:06.161197", >> "last_became_active": "2016-08-24 23:56:19.815284", >> "last_became_peered": "2016-08-24 23:56:19.815284", >> "last_unstale": "2016-08-25 06:41:22.446581", >> "last_undegraded": "2016-08-25 06:41:22.446581", >> "last_fullsized": "2016-08-25 06:41:22.446581", >> "mapping_epoch": 19928, >> "log_start": "1963'93313", >> "ondisk_log_start": "1963'93313", >> "created": 339, >> "last_epoch_clean": 17445, >> "parent": "0.0", >> "parent_split_bits": 0, >> "last_scrub": "19699'96439", >> "last_scrub_stamp": "2016-08-24 22:59:27.749260", >> "last_deep_scrub": "16645'96391", >> "last_deep_scrub_stamp": "2016-08-22 20:21:59.567449", >> "last_clean_scrub_stamp": "2016-08-24 22:59:27.749260", >> "log_size": 3130, >> "ondisk_log_size": 3130, >> "stats_invalid": false, >> "dirty_stats_invalid": false, >> "omap_stats_invalid": false, >> "hitset_stats_invalid": false, >> "hitset_bytes_stats_invalid": false, >> "pin_stats_invalid": true, >> "stat_sum": { >> "num_bytes": 0, >> "num_objects": 252, >> "num_object_clones": 0, >> "num_object_copies": 1008, >> "num_objects_missing_on_primary": 0, >> "num_objects_missing": 0, >> "num_objects_degraded": 2, >> "num_objects_misplaced": 504, >> "num_objects_unfound": 0, >> "num_objects_dirty": 252, >> "num_whiteouts": 0, >> "num_read": 21538, >> "num_read_kb": 323200, >> "num_write": 97965, >> "num_write_kb": 354745, >> "num_scrub_errors": 0, >> "num_shallow_scrub_errors": 0, >> "num_deep_scrub_errors": 0, >> "num_objects_recovered": 2006, >> "num_bytes_recovered": 0, >> "num_keys_recovered": 124614, >> "num_objects_omap": 252, >> "num_objects_hit_set_archive": 0, >> "num_bytes_hit_set_archive": 0, >> "num_flush": 0, >> "num_flush_kb": 0, >> "num_evict": 0, >> "num_evict_kb": 0, >> "num_promote": 0, >> "num_flush_mode_high": 0, >> "num_flush_mode_low": 0, >> "num_evict_mode_some": 0, >> "num_evict_mode_full": 0, >> "num_objects_pinned": 0 >> }, >> "up": [ >> 78, >> 59, >> 49 >> ], >> "acting": [ >> 78, >> 59, >> 49 >> ], >> "blocked_by": [], >> "up_primary": 78, >> "acting_primary": 78 >> }, >> "empty": 0, >> "dne": 0, >> "incomplete": 0, >> "last_epoch_started": 22440, >> "hit_set_history": { >> "current_last_update": "0'0", >> "history": [] >> } >> }, >> { >> "peer": "59", >> "pgid": "5.3d0", >> "last_update": "23099'104726", >> "last_complete": "23099'104726", >> "log_tail": "1963'93313", >> "last_user_version": 96444, >> "last_backfill": "MAX", >> "last_backfill_bitwise": 1, >> "purged_snaps": "[]", >> "history": { >> "epoch_created": 339, >> "last_epoch_started": 22440, >> "last_epoch_clean": 22440, >> "last_epoch_split": 0, >> "last_epoch_marked_full": 0, >> "same_up_since": 19928, >> "same_interval_since": 22439, >> "same_primary_since": 22439, >> "last_scrub": "23099'104724", >> "last_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_deep_scrub": "23099'104724", >> "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", >> "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646" >> }, >> "stats": { >> "version": "20737'96444", >> "reported_seq": "77806", >> "reported_epoch": "22437", >> "state": "active+remapped", >> "last_fresh": "2016-08-25 21:41:28.869909", >> "last_change": "2016-08-25 21:41:28.869350", >> "last_active": "2016-08-25 21:41:28.869909", >> "last_peered": "2016-08-25 21:41:28.869909", >> "last_clean": "2016-08-24 13:42:06.161197", >> "last_became_active": "2016-08-24 23:56:19.815284", >> "last_became_peered": "2016-08-24 23:56:19.815284", >> "last_unstale": "2016-08-25 21:41:28.869909", >> "last_undegraded": "2016-08-25 21:41:28.869909", >> "last_fullsized": "2016-08-25 21:41:28.869909", >> "mapping_epoch": 19928, >> "log_start": "1963'93313", >> "ondisk_log_start": "1963'93313", >> "created": 339, >> "last_epoch_clean": 22437, >> "parent": "0.0", >> "parent_split_bits": 0, >> "last_scrub": "19699'96439", >> "last_scrub_stamp": "2016-08-24 22:59:27.749260", >> "last_deep_scrub": "16645'96391", >> "last_deep_scrub_stamp": "2016-08-22 20:21:59.567449", >> "last_clean_scrub_stamp": "2016-08-24 22:59:27.749260", >> "log_size": 3131, >> "ondisk_log_size": 3131, >> "stats_invalid": false, >> "dirty_stats_invalid": false, >> "omap_stats_invalid": false, >> "hitset_stats_invalid": false, >> "hitset_bytes_stats_invalid": false, >> "pin_stats_invalid": true, >> "stat_sum": { >> "num_bytes": 0, >> "num_objects": 252, >> "num_object_clones": 0, >> "num_object_copies": 1008, >> "num_objects_missing_on_primary": 0, >> "num_objects_missing": 0, >> "num_objects_degraded": 2, >> "num_objects_misplaced": 252, >> "num_objects_unfound": 0, >> "num_objects_dirty": 252, >> "num_whiteouts": 0, >> "num_read": 21538, >> "num_read_kb": 323200, >> "num_write": 97965, >> "num_write_kb": 354745, >> "num_scrub_errors": 0, >> "num_shallow_scrub_errors": 0, >> "num_deep_scrub_errors": 0, >> "num_objects_recovered": 2510, >> "num_bytes_recovered": 0, >> "num_keys_recovered": 136360, >> "num_objects_omap": 252, >> "num_objects_hit_set_archive": 0, >> "num_bytes_hit_set_archive": 0, >> "num_flush": 0, >> "num_flush_kb": 0, >> "num_evict": 0, >> "num_evict_kb": 0, >> "num_promote": 0, >> "num_flush_mode_high": 0, >> "num_flush_mode_low": 0, >> "num_evict_mode_some": 0, >> "num_evict_mode_full": 0, >> "num_objects_pinned": 0 >> }, >> "up": [ >> 78, >> 59, >> 49 >> ], >> "acting": [ >> 78, >> 59, >> 49 >> ], >> "blocked_by": [], >> "up_primary": 78, >> "acting_primary": 78 >> }, >> "empty": 0, >> "dne": 0, >> "incomplete": 0, >> "last_epoch_started": 22440, >> "hit_set_history": { >> "current_last_update": "0'0", >> "history": [] >> } >> } >> ], >> "recovery_state": [ >> { >> "name": "Started\/Primary\/Active", >> "enter_time": "2016-08-25 21:41:30.400460", >> "might_have_unfound": [], >> "recovery_progress": { >> "backfill_targets": [], >> "waiting_on_backfill": [], >> "last_backfill_started": "MIN", >> "backfill_info": { >> "begin": "MIN", >> "end": "MIN", >> "objects": [] >> }, >> "peer_backfill_info": [], >> "backfills_in_flight": [], >> "recovering": [], >> "pg_backend": { >> "pull_from_peer": [], >> "pushing": [] >> } >> }, >> "scrub": { >> "scrubber.epoch_start": "22439", >> "scrubber.active": 0, >> "scrubber.state": "INACTIVE", >> "scrubber.start": "MIN", >> "scrubber.end": "MIN", >> "scrubber.subset_last_update": "0'0", >> "scrubber.deep": false, >> "scrubber.seed": 0, >> "scrubber.waiting_on": 0, >> "scrubber.waiting_on_whom": [] >> } >> }, >> { >> "name": "Started", >> "enter_time": "2016-08-25 21:41:29.291162" >> } >> ], >> "agent_state": {} >> } >> >> >> >> >> >> >> -- >> Goncalo Borges >> Research Computing >> ARC Centre of Excellence for Particle Physics at the Terascale >> School of Physics A28 | University of Sydney, NSW 2006 >> T: +61 2 93511937 >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > Cheers, > Brad > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com