Re: PG inconsistent with error "size_too_large"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In my cluster I saw that the problematic objects have been uploaded by a specific application (onedata), which I think used to upload the files doing something like:

rados --pool <pool> put <objectname> <filename>

Now (since Luminous ?) the default object size is 128MB but if I am not wrong it was 100GB before.
This would explain why I have such big objects around (which indeed have an old timestamp)

Cheers, Massimo 

On Wed, Jan 15, 2020 at 7:06 PM Liam Monahan <liam@xxxxxxxxxxxxxx> wrote:
I just changed my max object size to 256MB and scrubbed and the errors went away.  I’m not sure what can be done to reduce the size of these objects, though, if it really is a problem.  Our cluster has dynamic bucket index resharding turned on, but that sharding process shouldn’t help it if non-index objects are what is over the limit.

I don’t think a pg repair would do anything unless the config tunables are adjusted.

On Jan 15, 2020, at 10:56 AM, Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx> wrote:

I never changed the default value for that attribute

I am missing why I have such big objects around 

I am also wondering what a pg repair would do in such case

Il mer 15 gen 2020, 16:18 Liam Monahan <liam@xxxxxxxxxxxxxx> ha scritto:
Thanks for that link.

Do you have a default osd max object size of 128M?  I’m thinking about doubling that limit to 256MB on our cluster.  Our largest object is only about 10% over that limit.

On Jan 15, 2020, at 3:51 AM, Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx> wrote:

I guess this is coming from:


introduced in Nautilus 14.2.5

On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx> wrote:
As I wrote here:


I saw the same after an update from Luminous to Nautilus 14.2.6

Cheers, Massimo

On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan <liam@xxxxxxxxxxxxxx> wrote:
Hi,

I am getting one inconsistent object on our cluster with an inconsistency error that I haven’t seen before.  This started happening during a rolling upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not sure that’s related.

I was hoping to know what the error means before trying a repair.

[root@objmon04 ~]# ceph health detail
HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1 pg inconsistent
OSDMAP_FLAGS noout flag(s) set
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 9.20e is active+clean+inconsistent, acting [509,674,659]

rados list-inconsistent-obj 9.20e --format=json-pretty
{
    "epoch": 759019,
    "inconsistents": [
        {
            "object": {
                "name": "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 692875
            },
            "errors": [
                "size_too_large"
            ],
            "union_shard_errors": [],
            "selected_object_info": {
                "oid": {
                    "oid": "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
                    "key": "",
                    "snapid": -2,
                    "hash": 3321413134,
                    "max": 0,
                    "pool": 9,
                    "namespace": ""
                },
                "version": "281183'692875",
                "prior_version": "281183'692874",
                "last_reqid": "client.34042469.0:206759091",
                "user_version": 692875,
                "size": 146097278,
                "mtime": "2017-07-03 12:43:35.569986",
                "local_mtime": "2017-07-03 12:43:35.571196",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest"
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xf19c8035",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 509,
                    "primary": true,
                    "errors": [],
                    "size": 146097278
                },
                {
                    "osd": 659,
                    "primary": false,
                    "errors": [],
                    "size": 146097278
                },
                {
                    "osd": 674,
                    "primary": false,
                    "errors": [],
                    "size": 146097278
                }
            ]
        }
    ]
}

Thanks,
Liam

Senior Developer
Institute for Advanced Computer Studies
University of Maryland
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux