Hard disk bad manipulation: journal corruption and stale pgs

Zigor Ozamiz <zigor@xxxxxxxxxxxx> · Mon, 5 Jun 2017 14:32:14 +0200

Hi everyone,

Due to two beginner's big mistakes handling and recovering a hard disk,
we have reached to a situation in which the system tells us that the
journal of an osd is corrupted.

2017-05-30 17:59:21.318644 7fa90757a8c0  1 journal _open
/dev/disk/by-id/ata-INTEL_SSDSC2BA200G4_BTHV5281013C200MGN-part3 fd 20:
20480000000
 bytes, block size 4096 bytes, directio = 1, aio = 1
2017-05-30 17:59:21.322226 7fa90757a8c0 -1 journal Unable to read past
sequence 3219747309 but header indicates the journal has committed up
through 3219750285, journal is corrupt
2017-05-30 17:59:21.325946 7fa90757a8c0 -1 os/FileJournal.cc: In
function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&,
bool*)' thread 7fa90757a8c0 time 2017-05-30 17:59:21.322296
os/FileJournal.cc: 1853: FAILED assert(0)

We think that the only way we can reuse the osd, is wiping and starting
it again. But before doing that, we have lowered the weight by 0 and
waited until the cluster recover itself. Since that moment, several days
have passed but somg pgs still have "stale + active + clean" state.

pg_stat    state    up    up_primary    acting    acting_primary
1.b5    stale+active+clean    [0]    0    [0]    0
1.22    stale+active+clean    [0]    0    [0]    0
1.53    stale+active+clean    [0]    0    [0]    0
1.198    stale+active+clean    [0]    0    [0]    0
1.199    stale+active+clean    [0]    0    [0]    0
1.4e    stale+active+clean    [0]    0    [0]    0
1.4f    stale+active+clean    [0]    0    [0]    0
1.a7    stale+active+clean    [0]    0    [0]    0
1.1ef    stale+active+clean    [0]    0    [0]    0
1.160    stale+active+clean    [0]    0    [0]    0
18.4    stale+active+clean    [0]    0    [0]    0
1.15e    stale+active+clean    [0]    0    [0]    0
1.a1    stale+active+clean    [0]    0    [0]    0
1.18a    stale+active+clean    [0]    0    [0]    0
1.156    stale+active+clean    [0]    0    [0]    0
1.6b    stale+active+clean    [0]    0    [0]    0
1.c6    stale+active+clean    [0]    0    [0]    0
1.1b1    stale+active+clean    [0]    0    [0]    0
1.123    stale+active+clean    [0]    0    [0]    0
1.17a    stale+active+clean    [0]    0    [0]    0
1.bc    stale+active+clean    [0]    0    [0]    0
1.179    stale+active+clean    [0]    0    [0]    0
1.177    stale+active+clean    [0]    0    [0]    0
1.b8    stale+active+clean    [0]    0    [0]    0
1.2a    stale+active+clean    [0]    0    [0]    0
1.117    stale+active+clean    [0]    0    [0]    0

When executing a "ceph pg query PGID" or "ceph pg PGID list_missing", we
get the error "Error ENOENT: I do not have pgid PGID".

Given that we are using replication 3, there is no data loss, isn't it?
How could we proceed to solve the problem?

- Running: ceph osd lost OSDID; as recommended in some previous
consultation in this list.
- Recreating the pgs by hand via: ceph pg force_create PGID
- Making the wipe directly.

Thanks in advance,

-- 
Zigor Ozamiz

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com