Re: [ceph-users] Ceph scrub logs: _scan_snaps no head for $object?

David Zafman <dzafman@xxxxxxxxxx> · Wed, 2 May 2018 14:05:21 -0700

The message "_scan_snaps no head for" means that a snapshot of an object 
was found but not the head object.  This either is a race

during scrub caused by a bug or a stray snapshot for an object in the 
objectstore.  In either case this is benign unless you identify missing

data in rbd for these 3 objects.  If you don't need to recover the data, 
you could manually remove these 3 objects using

ceph-objectstore-tool to get rid of the messages during scrub.

David

On 5/2/18 11:37 AM, Sage Weil wrote:
[Moving to ceph-devel]

On Wed, 2 May 2018, Stefan Kooman wrote:
Hi,

Quoting Stefan Kooman (stefan@xxxxxx):
Hi,

We see the following in the logs after we start a scrub for some osds:

ceph-osd.2.log:2017-12-14 06:50:47.180344 7f0f47db2700  0 log_channel(cluster) log [DBG] : 1.2d8 scrub starts
ceph-osd.2.log:2017-12-14 06:50:47.180915 7f0f47db2700 -1 osd.2 pg_epoch: 11897 pg[1.2d8( v 11890'165209 (3221'163647,11890'165209] local-lis/les=11733/11734 n=67 ec=132/132 lis/c 11733/11733 les/c/f 11734/11734/0 11733/11733/11733) [2,45,31] r=0 lpr=11733 crt=11890'165209 lcod 11890'165208 mlcod 11890'165208 active+clean+scrubbing] _scan_snaps no head for 1:1b518155:::rbd_data.620652ae8944a.0000000000000126:29 (have MIN)
ceph-osd.2.log:2017-12-14 06:50:47.180929 7f0f47db2700 -1 osd.2 pg_epoch: 11897 pg[1.2d8( v 11890'165209 (3221'163647,11890'165209] local-lis/les=11733/11734 n=67 ec=132/132 lis/c 11733/11733 les/c/f 11734/11734/0 11733/11733/11733) [2,45,31] r=0 lpr=11733 crt=11890'165209 lcod 11890'165208 mlcod 11890'165208 active+clean+scrubbing] _scan_snaps no head for 1:1b518155:::rbd_data.620652ae8944a.0000000000000126:14 (have MIN)
ceph-osd.2.log:2017-12-14 06:50:47.180941 7f0f47db2700 -1 osd.2 pg_epoch: 11897 pg[1.2d8( v 11890'165209 (3221'163647,11890'165209] local-lis/les=11733/11734 n=67 ec=132/132 lis/c 11733/11733 les/c/f 11734/11734/0 11733/11733/11733) [2,45,31] r=0 lpr=11733 crt=11890'165209 lcod 11890'165208 mlcod 11890'165208 active+clean+scrubbing] _scan_snaps no head for 1:1b518155:::rbd_data.620652ae8944a.0000000000000126:a (have MIN)
ceph-osd.2.log:2017-12-14 06:50:47.214198 7f0f43daa700  0 log_channel(cluster) log [DBG] : 1.2d8 scrub ok

So finally it logs "scrub ok", but what does " _scan_snaps no head for ..." mean?
Does this indicate a problem?
Still seeing this issue on a freshly installed luminous cluster. I
*think* it either has to do with "cloned" RBDs that get snapshots by
themselves or RBDs that are cloned from a snapshot.

Any dev that wants to debug this behaviour if I'm able to reliably
reproduce this?
I'm interested!

The best thing would be if you can reproduce from an empty pool or new
image with debug osd = 20 and debug ms = 1 on the osds... is that feasible
in your environment?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html