Re: Understanding/correcting sudden onslaught of unfound objects

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 14 Feb 2018 17:49:41 +0000

On Tue, Feb 13, 2018 at 8:41 AM Graham Allan <gta@xxxxxxx> wrote:
I'm replying to myself here, but it's probably worth mentioning that

after this started, I did bring back the failed host, though with "ceph

osd weight 0" to avoid more data movement.

For inconsistent pgs containing unfound objects, the output of "ceph pg

<n> query" does then show the original osd being queried for objects,

and indeed if I dig through the filesystem I find the same 0-byte files

dated from 2015-2016.

This strongly implies to me that data loss occurred a long time in the

past and is not related to the osd host going down - this only triggered

the problem being found.

I would assume that too, but unless you had scrubbing disabled then it should have been discovered long ago; I don’t understand how it could have stayed hidden. Did you change any other settings recently?

Or, what is this EC pool being used for, and what are the EC settings? Having a bunch of empty files is not surprising if the objects are smaller than the chunk/stripe size — then just the primary and the parity locations would actually have data for them.

Graham

On 02/12/2018 06:26 PM, Graham Allan wrote:

> Hi,

>

> For the past few weeks I've been seeing a large number of pgs on our

> main erasure coded pool being flagged inconsistent, followed by them

> becoming active+recovery_wait+inconsistent with unfound objects. The

> cluster is currently running luminous 12.2.2 but has in the past also

> run its way through firefly, hammer and jewel.

>

> Here's a sample object from "ceph pg list_missing" (there are 150

> unfound objects in this particular pg):

>

> ceph health detail shows:

>>     pg 70.467 is stuck unclean for 1004525.715896, current state

>> active+recovery_wait+inconsistent, last acting [449,233,336,323,259,193]

>

> ceph pg 70.467 list_missing:

>>         {

>>             "oid": {

>>                 "oid":

>> "default.323253.6_20150226/Downloads/linux-nvme-HEAD-5aa2ffa/include/config/via/fir.h",

>>

>>                 "key": "",

>>                 "snapid": -2,

>>                 "hash": 628294759,

>>                 "max": 0,

>>                 "pool": 70,

>>                 "namespace": ""

>>             },

>>             "need": "73222'132227",

>>             "have": "0'0",

>>             "flags": "none",

>>             "locations": [

>>                 "193(5)",

>>                 "259(4)",

>>                 "449(0)"

>>             ]

>>         },

>

> When I trace through the filesystem on each OSD, I find the associated

> file present on each OSD but with size 0 bytes.

>

> Interestingly, for the 3 OSDs for which "list_missing" shows locations

> above (193,259,449), the timestamp of the 0-byte file is recent (within

> last few weeks). For the other 3 OSDs (233,336,323), it's in the distant

> past (08/2015 and 02/2016).

>

> All the unfound objects I've checked on this pg show the same pattern,

> along with the "have" epoch showing as "0'0".

>

> Other than the potential data loss being disturbing, I wonder why this

> showed up so suddenly?

>

> It seems to have been triggered by one OSD host failing over a long

> weekend. By the time we looked at it on Monday, the cluster had

> re-balanced enough data that I decided to simply leave it - we had long

> wanted to evacuate a first host to convert to a newer OS release, as

> well as Bluestore. Perhaps this was a bad choice, but the cluster

> recovery appeared to be proceeding normally, and was apparently complete

> a few days later. It was only around a week later that the unfound

> objects started.

>

> All the unfound object file fragments I've tracked down so far have

> their older members with timestamps in the same mid-2015 to mid-2016

> period. I could be wrong but this really seems like a long-standing

> problem has just been unearthed. I wonder if it could be connected to

> this thread from early 2016, concerning a problem on the same cluster:

>

> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008120.html

>

> It's a long thread, but the 0-byte files sound very like the "orphaned

> files" in that thread - related to performing a directory split while

> handling links on a filename with the special long filename handling...

>

> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008317.html

>

> However unlike that thread, I'm not finding any other files with

> duplicate names in the hierarchy.

>

> I'm not sure there's much else I can do besides record the names of any

> unfound objects before resorting to "mark_unfound_lost delete" - any

> suggestions for further research?

>

> Thanks,

>

> Graham

--

Graham Allan

Minnesota Supercomputing Institute - gta@xxxxxxx

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com