On Tue, Dec 12, 2017 at 9:37 AM Nick Fisk <nick@xxxxxxxxxx> wrote:
Does anyone know what this object (0.ae78c1cf) might be, it's not your
normal run of the mill RBD object and I can't seem to find it in the pool
using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an
activating+remapped state and blocking IO. Pool 0 is just a pure RBD pool
with a cache tier above it. There is no current mention of unfound objects
or any other obvious issues.
There is some backfilling going on, on another OSD which was upgraded to
bluestore, which was when the issue started. But I can't see any link in the
PG dump with upgraded OSD. My only thought so far is to wait for this
backfilling to finish and then deep-scrub this PG and see if that reveals
anything?
Thanks,
Nick
"description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
(undecoded)
ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
e105014)",
"initiated_at": "2017-12-12 17:10:50.030660",
"age": 335.948290,
"duration": 335.948383,
"type_data": {
"flag_point": "delayed",
"events": [
{
"time": "2017-12-12 17:10:50.030660",
"event": "initiated"
},
{
"time": "2017-12-12 17:10:50.030692",
"event": "queued_for_pg"
},
{
"time": "2017-12-12 17:10:50.030719",
"event": "reached_pg"
},
{
"time": "2017-12-12 17:10:50.030727",
"event": "waiting for peered"
},
{
"time": "2017-12-12 17:10:50.197353",
"event": "reached_pg"
},
{
"time": "2017-12-12 17:10:50.197355",
"event": "waiting for peered"
Is there some other evidence this object is the one causing the PG to be stuck? This trace is just what you get when a PG isn't peering and has nothing to do with the object involved. You'll need to figure out what is keeping the PG from peering.
(PG listing operations also require an active PG, so I expect "rados ls" is just skipping that PG — though I'm surprised it doesn't throw a warning or error.)
-Greg
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com