Just a small typo correction to my previous email. Without it the meaning was completely different:. "At this point I just need a way to recover the pg safely and I do NOT see how I can do that since it is impossible to understand what is the problematic osd with the incoherent object." Sorry for the extra email Cheers Goncalo ________________________________________ From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Goncalo Borges [goncalo.borges@xxxxxxxxxxxxx] Sent: 30 August 2016 18:53 To: Brad Hubbard Cc: ceph-users@xxxxxxxx Subject: Re: cephfs metadata pool: deep-scrub error "omap_digest != best guess omap_digest" Hi Brad Thanks for replying. Regarding versions and architecture,, everything is using 10.2.2 and the same arquitectures. We did had a single client using ceph-fuse 9.2.0 until yesterday. I can run a deep scrub with the log levels mentioned if it safe to do it in an inconsistent pg. I have read somewhere that it shouldn't be done but I do not remember where and how precise is that info. Is it safe to do so? Your explanation made things a bit clearer about the logic but not completely since the error says that the digest is not the same as the best guess. According to your logic this seems to indicate that there is no known value? At this point I just need a way to recover the pg safely and I do see how I can do that since it is impossible to understand what is the problematic osd with the incoherent object. I also think I am not the only seeing it. I participated in a discussion last time of exactly the same issue experienced by someone else in jewel Cheers Goncalo ________________________________________ From: Brad Hubbard [bhubbard@xxxxxxxxxx] Sent: 30 August 2016 17:13 To: Goncalo Borges Cc: ceph-users@xxxxxxxx Subject: Re: cephfs metadata pool: deep-scrub error "omap_digest != best guess omap_digest" On Tue, Aug 30, 2016 at 3:50 PM, Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> wrote: > Dear Ceph / CephFS supports. > > We are currently running Jewel 10.2.2. > > From time to time we experience deep-scrub errors in pgs inside our cephfs > metadata pool. It is important to note that we do not see any hardware > errors on the osds themselves so the error must have some other source. Can you verify that all your nodes are the same architecture and all running the same ceph version? This sort of thing has been reported before when running mismatched versions and/or architectures. http://tracker.ceph.com/issues/4743 looks similar but is very old and likely not relevant. In that case it was recommended to get logs with debug filestore = 20 debug osd = 30 debug ms = 1 logs from all three replicas of such a PG while running a deep scrub on it so gathering those may be a good idea as is opening a new tracker for this. > > The error itself is the following: > > # cat /var/log/ceph/ceph.log| grep 5.3d0 > 2016-08-30 00:30:53.492626 osd.78 192.231.127.171:6828/6072 331 : cluster > [INF] 5.3d0 deep-scrub starts > 2016-08-30 00:30:54.276134 osd.78 192.231.127.171:6828/6072 332 : cluster > [ERR] 5.3d0 shard 78: soid 5:0bd6d154:::602.00000000:head omap_digest > 0xf3fdfd0c != best guess omap_digest 0x23b2eae0 from auth shard 49 > 2016-08-30 00:30:54.747795 osd.78 192.231.127.171:6828/6072 333 : cluster > [ERR] 5.3d0 deep-scrub 0 missing, 1 inconsistent objects > 2016-08-30 00:30:54.747801 osd.78 192.231.127.171:6828/6072 334 : cluster > [ERR] 5.3d0 deep-scrub 1 errors AFAIU the omap_digest is a calculated CRC32 of omap header & key/values. These values are stored in the OSD leveldb and not the data directories but I think Greg already mentioned that last time? > > For us there are a few unknowns on how to recover from this error: > > 1) The first issue is that we do really do not understand the nature of the > error. What does it mean "omap_digest != best guess omap_digest"? it seems > to point to some problem in the digest of omap contents between the two osds > but does not tell you exactly what. src/osd/PGBackend.cc: 423 if (auth.digest_present && candidate.digest_present) { 424 if (auth.digest != candidate.digest) { 425 if (error != CLEAN) 426 errorstream << ", "; 427 error = DEEP_ERROR; 428 bool known = auth_oi.is_data_digest() && 429 auth.digest == auth_oi.data_digest; 430 errorstream << "data_digest 0x" << std::hex << candidate.digest 431 << " != " 432 << (known ? "known" : "best guess") 433 << " data_digest 0x" << auth.digest << std::dec 434 << " from auth shard " << auth_shard; 435 result.set_data_digest_mismatch(); 436 } 437 } On line 428 "known" is either set or it isn't. With precedence it actually looks like this. bool known = (((auth_oi.is_omap_digest)()) && ((auth.omap_digest) == (auth_oi.omap_digest))); auth_oi is an object_info_t so it looks like we are comparing the digest to a copy we already have stored and if they match the result is considered "known", otherwise it's considered a "best guess". Hopefully someone can elaborate on what this means. > > 2) The second issue is that it is really difficult to try to explore > metadata objects and omap info. While in the data pool we do know how to > inspect pgs and object contents (and decide what is the problematic osd by > comparison in a 3 replica setup), in the metadata pool we have to access pg > contents using 'ceph-objectstore-tool'. For that, we have to stop the osd so > that the deamon releases the omap lock. Moreover, I have successfully > imported / exported / listed pgs contents but I was never able to query omap > contents of objects inside pgs. Maybe I am doing it wrong but I do not find > the tool helpful at the moment for this precise task. > > 3) Finally, I am unsure what is the consequence of running 'pg repair'. In > my specific case, the primary osd is on a host which was recently added to > production. Moreover, the dates of the problematic object match for the > secondary piers which gives me the feeling that the primary osd (78) might > be the problematic one. I know that in the past, the default behavior was > simply to copy the pg contents of the primary osd to the others. That can > lead to data corruption if the problematic osd is indeed the primary, and I > wonder if in Jewel there is some smarter way to do the pg repair. > > [root@server9 ~]# ll > /var/lib/ceph/osd/ceph-78/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 > -rw-r--r-- 1 ceph ceph 0 Aug 25 21:41 > /var/lib/ceph/osd/ceph-78/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 > > > [root@server7 ~]# ll > /var/lib/ceph/osd/ceph-49/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 > -rw-r--r-- 1 ceph ceph 0 Jul 27 02:30 > /var/lib/ceph/osd/ceph-49/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 > > > [root@server8 ~]# ll > /var/lib/ceph/osd/ceph-59/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 > -rw-r--r-- 1 ceph ceph 0 Jul 27 02:30 > /var/lib/ceph/osd/ceph-59/current/5.3d0_head/602.00000000__head_2A8B6BD0__5 > > > A pg query does not seem to give me any information about the problem. The > query results follows this email. > > Any help is appreciated. > > Cheers > Goncalo > > --- * --- > > > > # ceph pg 5.3d0 query > { > "state": "active+clean+inconsistent", > "snap_trimq": "[]", > "epoch": 23099, > "up": [ > 78, > 59, > 49 > ], > "acting": [ > 78, > 59, > 49 > ], > "actingbackfill": [ > "49", > "59", > "78" > ], > "info": { > "pgid": "5.3d0", > "last_update": "23099'104726", > "last_complete": "23099'104726", > "log_tail": "23099'101639", > "last_user_version": 104726, > "last_backfill": "MAX", > "last_backfill_bitwise": 1, > "purged_snaps": "[]", > "history": { > "epoch_created": 339, > "last_epoch_started": 22440, > "last_epoch_clean": 22440, > "last_epoch_split": 0, > "last_epoch_marked_full": 0, > "same_up_since": 19928, > "same_interval_since": 22439, > "same_primary_since": 22439, > "last_scrub": "23099'104724", > "last_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_deep_scrub": "23099'104724", > "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646" > }, > "stats": { > "version": "23099'104726", > "reported_seq": "84233", > "reported_epoch": "23099", > "state": "active+clean+inconsistent", > "last_fresh": "2016-08-30 02:40:35.963747", > "last_change": "2016-08-30 00:30:54.747882", > "last_active": "2016-08-30 02:40:35.963747", > "last_peered": "2016-08-30 02:40:35.963747", > "last_clean": "2016-08-30 02:40:35.963747", > "last_became_active": "2016-08-25 21:41:30.649369", > "last_became_peered": "2016-08-25 21:41:30.649369", > "last_unstale": "2016-08-30 02:40:35.963747", > "last_undegraded": "2016-08-30 02:40:35.963747", > "last_fullsized": "2016-08-30 02:40:35.963747", > "mapping_epoch": 19928, > "log_start": "23099'101639", > "ondisk_log_start": "23099'101639", > "created": 339, > "last_epoch_clean": 22440, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "23099'104724", > "last_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_deep_scrub": "23099'104724", > "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646", > "log_size": 3087, > "ondisk_log_size": 3087, > "stats_invalid": false, > "dirty_stats_invalid": false, > "omap_stats_invalid": false, > "hitset_stats_invalid": false, > "hitset_bytes_stats_invalid": false, > "pin_stats_invalid": true, > "stat_sum": { > "num_bytes": 0, > "num_objects": 257, > "num_object_clones": 0, > "num_object_copies": 771, > "num_objects_missing_on_primary": 0, > "num_objects_missing": 0, > "num_objects_degraded": 0, > "num_objects_misplaced": 0, > "num_objects_unfound": 0, > "num_objects_dirty": 257, > "num_whiteouts": 0, > "num_read": 21865, > "num_read_kb": 378449, > "num_write": 106287, > "num_write_kb": 402800, > "num_scrub_errors": 1, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 1, > "num_objects_recovered": 2006, > "num_bytes_recovered": 0, > "num_keys_recovered": 124614, > "num_objects_omap": 257, > "num_objects_hit_set_archive": 0, > "num_bytes_hit_set_archive": 0, > "num_flush": 0, > "num_flush_kb": 0, > "num_evict": 0, > "num_evict_kb": 0, > "num_promote": 0, > "num_flush_mode_high": 0, > "num_flush_mode_low": 0, > "num_evict_mode_some": 0, > "num_evict_mode_full": 0, > "num_objects_pinned": 0 > }, > "up": [ > 78, > 59, > 49 > ], > "acting": [ > 78, > 59, > 49 > ], > "blocked_by": [], > "up_primary": 78, > "acting_primary": 78 > }, > "empty": 0, > "dne": 0, > "incomplete": 0, > "last_epoch_started": 22440, > "hit_set_history": { > "current_last_update": "0'0", > "history": [] > } > }, > "peer_info": [ > { > "peer": "49", > "pgid": "5.3d0", > "last_update": "23099'104726", > "last_complete": "23099'104726", > "log_tail": "1963'93313", > "last_user_version": 96444, > "last_backfill": "MAX", > "last_backfill_bitwise": 1, > "purged_snaps": "[]", > "history": { > "epoch_created": 339, > "last_epoch_started": 22440, > "last_epoch_clean": 22440, > "last_epoch_split": 0, > "last_epoch_marked_full": 0, > "same_up_since": 19928, > "same_interval_since": 22439, > "same_primary_since": 22439, > "last_scrub": "23099'104724", > "last_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_deep_scrub": "23099'104724", > "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646" > }, > "stats": { > "version": "20737'96443", > "reported_seq": "77807", > "reported_epoch": "22439", > "state": "active+remapped+wait_backfill", > "last_fresh": "2016-08-25 06:41:22.446581", > "last_change": "2016-08-24 23:56:20.865302", > "last_active": "2016-08-25 06:41:22.446581", > "last_peered": "2016-08-25 06:41:22.446581", > "last_clean": "2016-08-24 13:42:06.161197", > "last_became_active": "2016-08-24 23:56:19.815284", > "last_became_peered": "2016-08-24 23:56:19.815284", > "last_unstale": "2016-08-25 06:41:22.446581", > "last_undegraded": "2016-08-25 06:41:22.446581", > "last_fullsized": "2016-08-25 06:41:22.446581", > "mapping_epoch": 19928, > "log_start": "1963'93313", > "ondisk_log_start": "1963'93313", > "created": 339, > "last_epoch_clean": 17445, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "19699'96439", > "last_scrub_stamp": "2016-08-24 22:59:27.749260", > "last_deep_scrub": "16645'96391", > "last_deep_scrub_stamp": "2016-08-22 20:21:59.567449", > "last_clean_scrub_stamp": "2016-08-24 22:59:27.749260", > "log_size": 3130, > "ondisk_log_size": 3130, > "stats_invalid": false, > "dirty_stats_invalid": false, > "omap_stats_invalid": false, > "hitset_stats_invalid": false, > "hitset_bytes_stats_invalid": false, > "pin_stats_invalid": true, > "stat_sum": { > "num_bytes": 0, > "num_objects": 252, > "num_object_clones": 0, > "num_object_copies": 1008, > "num_objects_missing_on_primary": 0, > "num_objects_missing": 0, > "num_objects_degraded": 2, > "num_objects_misplaced": 504, > "num_objects_unfound": 0, > "num_objects_dirty": 252, > "num_whiteouts": 0, > "num_read": 21538, > "num_read_kb": 323200, > "num_write": 97965, > "num_write_kb": 354745, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 2006, > "num_bytes_recovered": 0, > "num_keys_recovered": 124614, > "num_objects_omap": 252, > "num_objects_hit_set_archive": 0, > "num_bytes_hit_set_archive": 0, > "num_flush": 0, > "num_flush_kb": 0, > "num_evict": 0, > "num_evict_kb": 0, > "num_promote": 0, > "num_flush_mode_high": 0, > "num_flush_mode_low": 0, > "num_evict_mode_some": 0, > "num_evict_mode_full": 0, > "num_objects_pinned": 0 > }, > "up": [ > 78, > 59, > 49 > ], > "acting": [ > 78, > 59, > 49 > ], > "blocked_by": [], > "up_primary": 78, > "acting_primary": 78 > }, > "empty": 0, > "dne": 0, > "incomplete": 0, > "last_epoch_started": 22440, > "hit_set_history": { > "current_last_update": "0'0", > "history": [] > } > }, > { > "peer": "59", > "pgid": "5.3d0", > "last_update": "23099'104726", > "last_complete": "23099'104726", > "log_tail": "1963'93313", > "last_user_version": 96444, > "last_backfill": "MAX", > "last_backfill_bitwise": 1, > "purged_snaps": "[]", > "history": { > "epoch_created": 339, > "last_epoch_started": 22440, > "last_epoch_clean": 22440, > "last_epoch_split": 0, > "last_epoch_marked_full": 0, > "same_up_since": 19928, > "same_interval_since": 22439, > "same_primary_since": 22439, > "last_scrub": "23099'104724", > "last_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_deep_scrub": "23099'104724", > "last_deep_scrub_stamp": "2016-08-30 00:30:54.747833", > "last_clean_scrub_stamp": "2016-08-29 00:30:33.716646" > }, > "stats": { > "version": "20737'96444", > "reported_seq": "77806", > "reported_epoch": "22437", > "state": "active+remapped", > "last_fresh": "2016-08-25 21:41:28.869909", > "last_change": "2016-08-25 21:41:28.869350", > "last_active": "2016-08-25 21:41:28.869909", > "last_peered": "2016-08-25 21:41:28.869909", > "last_clean": "2016-08-24 13:42:06.161197", > "last_became_active": "2016-08-24 23:56:19.815284", > "last_became_peered": "2016-08-24 23:56:19.815284", > "last_unstale": "2016-08-25 21:41:28.869909", > "last_undegraded": "2016-08-25 21:41:28.869909", > "last_fullsized": "2016-08-25 21:41:28.869909", > "mapping_epoch": 19928, > "log_start": "1963'93313", > "ondisk_log_start": "1963'93313", > "created": 339, > "last_epoch_clean": 22437, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "19699'96439", > "last_scrub_stamp": "2016-08-24 22:59:27.749260", > "last_deep_scrub": "16645'96391", > "last_deep_scrub_stamp": "2016-08-22 20:21:59.567449", > "last_clean_scrub_stamp": "2016-08-24 22:59:27.749260", > "log_size": 3131, > "ondisk_log_size": 3131, > "stats_invalid": false, > "dirty_stats_invalid": false, > "omap_stats_invalid": false, > "hitset_stats_invalid": false, > "hitset_bytes_stats_invalid": false, > "pin_stats_invalid": true, > "stat_sum": { > "num_bytes": 0, > "num_objects": 252, > "num_object_clones": 0, > "num_object_copies": 1008, > "num_objects_missing_on_primary": 0, > "num_objects_missing": 0, > "num_objects_degraded": 2, > "num_objects_misplaced": 252, > "num_objects_unfound": 0, > "num_objects_dirty": 252, > "num_whiteouts": 0, > "num_read": 21538, > "num_read_kb": 323200, > "num_write": 97965, > "num_write_kb": 354745, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 2510, > "num_bytes_recovered": 0, > "num_keys_recovered": 136360, > "num_objects_omap": 252, > "num_objects_hit_set_archive": 0, > "num_bytes_hit_set_archive": 0, > "num_flush": 0, > "num_flush_kb": 0, > "num_evict": 0, > "num_evict_kb": 0, > "num_promote": 0, > "num_flush_mode_high": 0, > "num_flush_mode_low": 0, > "num_evict_mode_some": 0, > "num_evict_mode_full": 0, > "num_objects_pinned": 0 > }, > "up": [ > 78, > 59, > 49 > ], > "acting": [ > 78, > 59, > 49 > ], > "blocked_by": [], > "up_primary": 78, > "acting_primary": 78 > }, > "empty": 0, > "dne": 0, > "incomplete": 0, > "last_epoch_started": 22440, > "hit_set_history": { > "current_last_update": "0'0", > "history": [] > } > } > ], > "recovery_state": [ > { > "name": "Started\/Primary\/Active", > "enter_time": "2016-08-25 21:41:30.400460", > "might_have_unfound": [], > "recovery_progress": { > "backfill_targets": [], > "waiting_on_backfill": [], > "last_backfill_started": "MIN", > "backfill_info": { > "begin": "MIN", > "end": "MIN", > "objects": [] > }, > "peer_backfill_info": [], > "backfills_in_flight": [], > "recovering": [], > "pg_backend": { > "pull_from_peer": [], > "pushing": [] > } > }, > "scrub": { > "scrubber.epoch_start": "22439", > "scrubber.active": 0, > "scrubber.state": "INACTIVE", > "scrubber.start": "MIN", > "scrubber.end": "MIN", > "scrubber.subset_last_update": "0'0", > "scrubber.deep": false, > "scrubber.seed": 0, > "scrubber.waiting_on": 0, > "scrubber.waiting_on_whom": [] > } > }, > { > "name": "Started", > "enter_time": "2016-08-25 21:41:29.291162" > } > ], > "agent_state": {} > } > > > > > > > -- > Goncalo Borges > Research Computing > ARC Centre of Excellence for Particle Physics at the Terascale > School of Physics A28 | University of Sydney, NSW 2006 > T: +61 2 93511937 > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com