Re: mimic: 3/4 OSDs crashed on "bluefs enospc"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 2, 2018 at 10:23 AM Alex Litvak
<alexander.v.litvak@xxxxxxxxx> wrote:
>
> Igor,
>
> Thank you for your reply.  So what you are saying there are really no
> sensible space requirements for a collocated device? Even if I setup 30
> GB for DB (which I really wouldn't like to do due to a space waste
> considerations ) there is a chance that if this space feels up I will be
> in the same trouble under some heavy load scenario?

We do have good sizing recommendations for a separate block.db
partition. Roughly it shouldn't be less than 4% the size of the data
device.

http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing

>
> On 10/2/2018 9:15 AM, Igor Fedotov wrote:
> > Even with a single device bluestore has a sort of implicit "BlueFS
> > partition" where DB is stored.  And it dynamically adjusts (rebalances)
> > the space for that partition in background. Unfortunately it might
> > perform that "too lazy" and hence under some heavy load it might end-up
> > with the lack of space for that partition. While main device still has
> > plenty of free space.
> >
> > I'm planning to refactor this re-balancing procedure in the future to
> > eliminate the root cause.
> >
> >
> > Thanks,
> >
> > Igor
> >
> >
> > On 10/2/2018 5:04 PM, Alex Litvak wrote:
> >> I am sorry for interrupting the thread, but my understanding always
> >> was that blue store on the single device should not care of the DB
> >> size, i.e. it would use the data part for all operations if DB is
> >> full.  And if it is not true, what would be sensible defaults on 800
> >> GB SSD?  I used ceph-ansible to build my cluster with system defaults
> >> and from I reading in this thread doesn't give me a good feeling at
> >> all. Document ion on the topic is very sketchy and online posts
> >> contradict each other some times.
> >>
> >> Thank you in advance,
> >>
> >> On 10/2/2018 8:52 AM, Igor Fedotov wrote:
> >>> May I have a repair log for that "already expanded" OSD?
> >>>
> >>>
> >>> On 10/2/2018 4:32 PM, Sergey Malinin wrote:
> >>>> Repair goes through only when LVM volume has been expanded,
> >>>> otherwise it fails with enospc as well as any other operation.
> >>>> However, expanding the volume immediately renders bluefs unmountable
> >>>> with IO error.
> >>>> 2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at
> >>>> the very end of bluefs-log-dump), I'm not sure whether corruption
> >>>> occurred before or after volume expansion.
> >>>>
> >>>>
> >>>>> On 2.10.2018, at 16:07, Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >>>>>
> >>>>> You mentioned repair had worked before, is that correct? What's the
> >>>>> difference now except the applied patch? Different OSD? Anything else?
> >>>>>
> >>>>>
> >>>>> On 10/2/2018 3:52 PM, Sergey Malinin wrote:
> >>>>>
> >>>>>> It didn't work, emailed logs to you.
> >>>>>>
> >>>>>>
> >>>>>>> On 2.10.2018, at 14:43, Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >>>>>>>
> >>>>>>> The major change is in get_bluefs_rebalance_txn function, it
> >>>>>>> lacked bluefs_rebalance_txn assignment..
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
> >>>>>>>> PR doesn't seem to have changed since yesterday. Am I missing
> >>>>>>>> something?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 2.10.2018, at 14:15, Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >>>>>>>>>
> >>>>>>>>> Please update the patch from the PR - it didn't update bluefs
> >>>>>>>>> extents list before.
> >>>>>>>>>
> >>>>>>>>> Also please set debug bluestore 20 when re-running repair and
> >>>>>>>>> collect the log.
> >>>>>>>>>
> >>>>>>>>> If repair doesn't help - would you send repair and startup logs
> >>>>>>>>> directly to me as I have some issues accessing ceph-post-file
> >>>>>>>>> uploads.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Igor
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 10/2/2018 11:39 AM, Sergey Malinin wrote:
> >>>>>>>>>> Yes, I did repair all OSDs and it finished with 'repair
> >>>>>>>>>> success'. I backed up OSDs so now I have more room to play.
> >>>>>>>>>> I posted log files using ceph-post-file with the following IDs:
> >>>>>>>>>> 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
> >>>>>>>>>> 20df7df5-f0c9-4186-aa21-4e5c0172cd93
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 2.10.2018, at 11:26, Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> You did repair for any of this OSDs, didn't you? For all of
> >>>>>>>>>>> them?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Would you please provide the log for both types (failed on
> >>>>>>>>>>> mount and failed with enospc) of failing OSDs. Prior to
> >>>>>>>>>>> collecting please remove existing ones prior and set debug
> >>>>>>>>>>> bluestore to 20.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
> >>>>>>>>>>>> I was able to apply patches to mimic, but nothing changed.
> >>>>>>>>>>>> One osd that I had space expanded on fails with bluefs mount
> >>>>>>>>>>>> IO error, others keep failing with enospc.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 1.10.2018, at 19:26, Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So you should call repair which rebalances (i.e. allocates
> >>>>>>>>>>>>> additional space) BlueFS space. Hence allowing OSD to start.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Igor
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
> >>>>>>>>>>>>>> Not exactly. The rebalancing from this kv_sync_thread
> >>>>>>>>>>>>>> still might be deferred due to the nature of this thread
> >>>>>>>>>>>>>> (haven't 100% sure though).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Here is my PR showing the idea (still untested and perhaps
> >>>>>>>>>>>>>> unfinished!!!)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> https://github.com/ceph/ceph/pull/24353
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Igor
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 10/1/2018 7:07 PM, Sergey Malinin wrote:
> >>>>>>>>>>>>>>> Can you please confirm whether I got this right:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --- BlueStore.cc.bak    2018-10-01 18:54:45.096836419 +0300
> >>>>>>>>>>>>>>> +++ BlueStore.cc    2018-10-01 19:01:35.937623861 +0300
> >>>>>>>>>>>>>>> @@ -9049,22 +9049,17 @@
> >>>>>>>>>>>>>>>          throttle_bytes.put(costs);
> >>>>>>>>>>>>>>>            PExtentVector bluefs_gift_extents;
> >>>>>>>>>>>>>>> -      if (bluefs &&
> >>>>>>>>>>>>>>> -      after_flush - bluefs_last_balance >
> >>>>>>>>>>>>>>> - cct->_conf->bluestore_bluefs_balance_interval) {
> >>>>>>>>>>>>>>> -    bluefs_last_balance = after_flush;
> >>>>>>>>>>>>>>> -    int r =
> >>>>>>>>>>>>>>> _balance_bluefs_freespace(&bluefs_gift_extents);
> >>>>>>>>>>>>>>> -    assert(r >= 0);
> >>>>>>>>>>>>>>> -    if (r > 0) {
> >>>>>>>>>>>>>>> -      for (auto& p : bluefs_gift_extents) {
> >>>>>>>>>>>>>>> -        bluefs_extents.insert(p.offset, p.length);
> >>>>>>>>>>>>>>> -      }
> >>>>>>>>>>>>>>> -      bufferlist bl;
> >>>>>>>>>>>>>>> -      encode(bluefs_extents, bl);
> >>>>>>>>>>>>>>> -      dout(10) << __func__ << " bluefs_extents now 0x"
> >>>>>>>>>>>>>>> << std::hex
> >>>>>>>>>>>>>>> -           << bluefs_extents << std::dec << dendl;
> >>>>>>>>>>>>>>> -      synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> >>>>>>>>>>>>>>> +      int r =
> >>>>>>>>>>>>>>> _balance_bluefs_freespace(&bluefs_gift_extents);
> >>>>>>>>>>>>>>> +      ceph_assert(r >= 0);
> >>>>>>>>>>>>>>> +      if (r > 0) {
> >>>>>>>>>>>>>>> +    for (auto& p : bluefs_gift_extents) {
> >>>>>>>>>>>>>>> +      bluefs_extents.insert(p.offset, p.length);
> >>>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>> +    bufferlist bl;
> >>>>>>>>>>>>>>> +    encode(bluefs_extents, bl);
> >>>>>>>>>>>>>>> +    dout(10) << __func__ << " bluefs_extents now 0x" <<
> >>>>>>>>>>>>>>> std::hex
> >>>>>>>>>>>>>>> +         << bluefs_extents << std::dec << dendl;
> >>>>>>>>>>>>>>> +    synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> >>>>>>>>>>>>>>>          }
> >>>>>>>>>>>>>>>            // cleanup sync deferred keys
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 1.10.2018, at 18:39, Igor Fedotov <ifedotov@xxxxxxx>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So you have just a single main device per OSD....
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Then bluestore-tool wouldn't help, it's unable to expand
> >>>>>>>>>>>>>>>> BlueFS partition at main device, standalone devices are
> >>>>>>>>>>>>>>>> supported only.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Given that you're able to rebuild the code I can suggest
> >>>>>>>>>>>>>>>> to make a patch that triggers BlueFS rebalance (see code
> >>>>>>>>>>>>>>>> snippet below) on repairing.
> >>>>>>>>>>>>>>>>       PExtentVector bluefs_gift_extents;
> >>>>>>>>>>>>>>>>       int r =
> >>>>>>>>>>>>>>>> _balance_bluefs_freespace(&bluefs_gift_extents);
> >>>>>>>>>>>>>>>>       ceph_assert(r >= 0);
> >>>>>>>>>>>>>>>>       if (r > 0) {
> >>>>>>>>>>>>>>>>         for (auto& p : bluefs_gift_extents) {
> >>>>>>>>>>>>>>>> bluefs_extents.insert(p.offset, p.length);
> >>>>>>>>>>>>>>>>         }
> >>>>>>>>>>>>>>>>         bufferlist bl;
> >>>>>>>>>>>>>>>>         encode(bluefs_extents, bl);
> >>>>>>>>>>>>>>>>         dout(10) << __func__ << " bluefs_extents now 0x"
> >>>>>>>>>>>>>>>> << std::hex
> >>>>>>>>>>>>>>>>              << bluefs_extents << std::dec << dendl;
> >>>>>>>>>>>>>>>>         synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> >>>>>>>>>>>>>>>>       }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If it waits I can probably make a corresponding PR
> >>>>>>>>>>>>>>>> tomorrow.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> Igor
> >>>>>>>>>>>>>>>> On 10/1/2018 6:16 PM, Sergey Malinin wrote:
> >>>>>>>>>>>>>>>>> I have rebuilt the tool, but none of my OSDs no matter
> >>>>>>>>>>>>>>>>> dead or alive have any symlinks other than 'block'
> >>>>>>>>>>>>>>>>> pointing to LVM.
> >>>>>>>>>>>>>>>>> I adjusted main device size but it looks like it needs
> >>>>>>>>>>>>>>>>> even more space for db compaction. After executing
> >>>>>>>>>>>>>>>>> bluefs-bdev-expand OSD fails to start, however 'fsck'
> >>>>>>>>>>>>>>>>> and 'repair' commands finished successfully.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:39.763 7fc9226c6240  1
> >>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc opening
> >>>>>>>>>>>>>>>>> allocation metadata
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.907 7fc9226c6240  1
> >>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded
> >>>>>>>>>>>>>>>>> 285 GiB in 2249899 extents
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.951 7fc9226c6240 -1
> >>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1)
> >>>>>>>>>>>>>>>>> _reconcile_bluefs_freespace bluefs extra
> >>>>>>>>>>>>>>>>> 0x[6d6f000000~50c800000]
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc
> >>>>>>>>>>>>>>>>> 0x0x55d053fb9180 shutdown
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb:
> >>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252]
> >>>>>>>>>>>>>>>>> Shutdown: canceling all background work
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb:
> >>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397]
> >>>>>>>>>>>>>>>>> Shutdown complete
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc
> >>>>>>>>>>>>>>>>> 0x0x55d053883800 shutdown
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.975 7fc9226c6240  1
> >>>>>>>>>>>>>>>>> bdev(0x55d053c32e00 /var/lib/ceph/osd/ceph-1/block) close
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:41.267 7fc9226c6240  1
> >>>>>>>>>>>>>>>>> bdev(0x55d053c32a80 /var/lib/ceph/osd/ceph-1/block) close
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0
> >>>>>>>>>>>>>>>>> OSD:init: unable to mount object store
> >>>>>>>>>>>>>>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd
> >>>>>>>>>>>>>>>>> init failed: (5) Input/output error
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 1.10.2018, at 18:09, Igor Fedotov
> >>>>>>>>>>>>>>>>>> <ifedotov@xxxxxxx> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Well, actually you can avoid bluestore-tool rebuild.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> You'll need to edit the first chunk of blocks.db where
> >>>>>>>>>>>>>>>>>> labels are stored. (Please make a backup first!!!)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Size label is stored at offset 0x52 and is 8 bytes
> >>>>>>>>>>>>>>>>>> long - little-endian 64bit integer encoding. (Please
> >>>>>>>>>>>>>>>>>> verify that old value at this offset exactly
> >>>>>>>>>>>>>>>>>> corresponds to you original volume size and/or 'size'
> >>>>>>>>>>>>>>>>>> label reported by ceph-bluestore-tool).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So you have to put new DB volume size there. Or you
> >>>>>>>>>>>>>>>>>> can send the first 4K chunk (e.g. extracted with dd)
> >>>>>>>>>>>>>>>>>> along with new DB volume size (in bytes) to me and
> >>>>>>>>>>>>>>>>>> I'll do that for you.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Igor
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 10/1/2018 5:32 PM, Igor Fedotov wrote:
> >>>>>>>>>>>>>>>>>>> On 10/1/2018 5:03 PM, Sergey Malinin wrote:
> >>>>>>>>>>>>>>>>>>>> Before I received your response, I had already added
> >>>>>>>>>>>>>>>>>>>> 20GB to the OSD (by epanding LV followed by
> >>>>>>>>>>>>>>>>>>>> bluefs-bdev-expand) and ran "ceph-kvstore-tool
> >>>>>>>>>>>>>>>>>>>> bluestore-kv <path> compact", however it still needs
> >>>>>>>>>>>>>>>>>>>> more space.
> >>>>>>>>>>>>>>>>>>>> Is that because I didn't update DB size with
> >>>>>>>>>>>>>>>>>>>> set-label-key?
> >>>>>>>>>>>>>>>>>>> In mimic you need to run both "bluefs-bdev-expand"
> >>>>>>>>>>>>>>>>>>> and "set-label-key" command to commit bluefs volume
> >>>>>>>>>>>>>>>>>>> expansion.
> >>>>>>>>>>>>>>>>>>> Unfortunately the last command doesn't handle "size"
> >>>>>>>>>>>>>>>>>>> label properly. That's why you might need to backport
> >>>>>>>>>>>>>>>>>>> and rebuild with the mentioned commits.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> What exactly is the label-key that needs to be
> >>>>>>>>>>>>>>>>>>>> updated, as I couldn't find which one is related to DB:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> # ceph-bluestore-tool show-label --path
> >>>>>>>>>>>>>>>>>>>> /var/lib/ceph/osd/ceph-1
> >>>>>>>>>>>>>>>>>>>> inferring bluefs devices from bluestore path
> >>>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>> "/var/lib/ceph/osd/ceph-1/block": {
> >>>>>>>>>>>>>>>>>>>>            "osd_uuid":
> >>>>>>>>>>>>>>>>>>>> "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc",
> >>>>>>>>>>>>>>>>>>>>            "size": 471305551872,
> >>>>>>>>>>>>>>>>>>>>            "btime": "2018-07-31 03:06:43.751243",
> >>>>>>>>>>>>>>>>>>>>            "description": "main",
> >>>>>>>>>>>>>>>>>>>>            "bluefs": "1",
> >>>>>>>>>>>>>>>>>>>>            "ceph_fsid":
> >>>>>>>>>>>>>>>>>>>> "7d320499-5b3f-453e-831f-60d4db9a4533",
> >>>>>>>>>>>>>>>>>>>>            "kv_backend": "rocksdb",
> >>>>>>>>>>>>>>>>>>>>            "magic": "ceph osd volume v026",
> >>>>>>>>>>>>>>>>>>>>            "mkfs_done": "yes",
> >>>>>>>>>>>>>>>>>>>>            "osd_key": "XXX",
> >>>>>>>>>>>>>>>>>>>>            "ready": "ready",
> >>>>>>>>>>>>>>>>>>>>            "whoami": "1"
> >>>>>>>>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>> 'size' label but your output is for block(aka slow)
> >>>>>>>>>>>>>>>>>>> device.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It should return labels for db/wal devices as well
> >>>>>>>>>>>>>>>>>>> (block.db and block.wal symlinks respectively). It
> >>>>>>>>>>>>>>>>>>> works for me in master, can't verify with mimic at
> >>>>>>>>>>>>>>>>>>> the moment though..
> >>>>>>>>>>>>>>>>>>> Here is output for master:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> # bin/ceph-bluestore-tool show-label --path dev/osd0
> >>>>>>>>>>>>>>>>>>> inferring bluefs devices from bluestore path
> >>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>       "dev/osd0/block": {
> >>>>>>>>>>>>>>>>>>>           "osd_uuid":
> >>>>>>>>>>>>>>>>>>> "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
> >>>>>>>>>>>>>>>>>>>           "size": 21474836480,
> >>>>>>>>>>>>>>>>>>>           "btime": "2018-09-10 15:55:09.044039",
> >>>>>>>>>>>>>>>>>>>           "description": "main",
> >>>>>>>>>>>>>>>>>>>           "bluefs": "1",
> >>>>>>>>>>>>>>>>>>>           "ceph_fsid":
> >>>>>>>>>>>>>>>>>>> "56eddc15-11b9-4e0b-9192-e391fbae551c",
> >>>>>>>>>>>>>>>>>>>           "kv_backend": "rocksdb",
> >>>>>>>>>>>>>>>>>>>           "magic": "ceph osd volume v026",
> >>>>>>>>>>>>>>>>>>>           "mkfs_done": "yes",
> >>>>>>>>>>>>>>>>>>>           "osd_key":
> >>>>>>>>>>>>>>>>>>> "AQCsaZZbYTxXJBAAe3jJI4p6WbMjvA8CBBUJbA==",
> >>>>>>>>>>>>>>>>>>>           "ready": "ready",
> >>>>>>>>>>>>>>>>>>>           "whoami": "0"
> >>>>>>>>>>>>>>>>>>>       },
> >>>>>>>>>>>>>>>>>>>       "dev/osd0/block.wal": {
> >>>>>>>>>>>>>>>>>>>           "osd_uuid":
> >>>>>>>>>>>>>>>>>>> "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
> >>>>>>>>>>>>>>>>>>>           "size": 1048576000,
> >>>>>>>>>>>>>>>>>>>           "btime": "2018-09-10 15:55:09.044985",
> >>>>>>>>>>>>>>>>>>>           "description": "bluefs wal"
> >>>>>>>>>>>>>>>>>>>       },
> >>>>>>>>>>>>>>>>>>>       "dev/osd0/block.db": {
> >>>>>>>>>>>>>>>>>>>           "osd_uuid":
> >>>>>>>>>>>>>>>>>>> "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
> >>>>>>>>>>>>>>>>>>>           "size": 1048576000,
> >>>>>>>>>>>>>>>>>>>           "btime": "2018-09-10 15:55:09.044469",
> >>>>>>>>>>>>>>>>>>>           "description": "bluefs db"
> >>>>>>>>>>>>>>>>>>>       }
> >>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> You can try --dev option instead of --path, e.g.
> >>>>>>>>>>>>>>>>>>> ceph-bluestore-tool show-label --dev <path-to-block.db>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On 1.10.2018, at 16:48, Igor Fedotov
> >>>>>>>>>>>>>>>>>>>>> <ifedotov@xxxxxxx> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> This looks like a sort of deadlock when BlueFS
> >>>>>>>>>>>>>>>>>>>>> needs some additional space to replay the log left
> >>>>>>>>>>>>>>>>>>>>> after the crash. Which happens during BlueFS open.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> But such a space (at slow device as DB is full) is
> >>>>>>>>>>>>>>>>>>>>> gifted in background during bluefs rebalance
> >>>>>>>>>>>>>>>>>>>>> procedure which will occur after the open.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hence OSDs stuck in permanent crashing..
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> The only way to recover I can suggest for now is to
> >>>>>>>>>>>>>>>>>>>>> expand DB volumes. You can do that with lvm tools
> >>>>>>>>>>>>>>>>>>>>> if you have any spare space for that.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Once resized you'll need ceph-bluestore-tool to
> >>>>>>>>>>>>>>>>>>>>> indicate volume expansion to BlueFS
> >>>>>>>>>>>>>>>>>>>>> (bluefs-bdev-expand command ) and finally update DB
> >>>>>>>>>>>>>>>>>>>>> volume size label with set-label-key command.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> The latter is a bit tricky for mimic - you might
> >>>>>>>>>>>>>>>>>>>>> need to backport
> >>>>>>>>>>>>>>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> and rebuild ceph-bluestore-tool. Alternatively you
> >>>>>>>>>>>>>>>>>>>>> can backport
> >>>>>>>>>>>>>>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> then bluefs expansion and label updates will occur
> >>>>>>>>>>>>>>>>>>>>> in a single step.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I'll do these backports in upstream but this will
> >>>>>>>>>>>>>>>>>>>>> take some time to pass all the procedures and get
> >>>>>>>>>>>>>>>>>>>>> into official mimic release.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Will fire a ticket to fix the original issue as well.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Igor
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On 10/1/2018 3:28 PM, Sergey Malinin wrote:
> >>>>>>>>>>>>>>>>>>>>>> These are LVM bluestore NVMe SSDs created with
> >>>>>>>>>>>>>>>>>>>>>> "ceph-volume --lvm prepare --bluestore
> >>>>>>>>>>>>>>>>>>>>>> /dev/nvme0n1p3" i.e. without specifying wal/db
> >>>>>>>>>>>>>>>>>>>>>> devices.
> >>>>>>>>>>>>>>>>>>>>>> OSDs were created with
> >>>>>>>>>>>>>>>>>>>>>> bluestore_min_alloc_size_ssd=4096, another
> >>>>>>>>>>>>>>>>>>>>>> modified setting is bluestore_cache_kv_max=1073741824
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> DB/block usage collected by prometheus module for
> >>>>>>>>>>>>>>>>>>>>>> 3 failed and 1 survived OSDs:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"}
> >>>>>>>>>>>>>>>>>>>>>> 65493008384.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"}
> >>>>>>>>>>>>>>>>>>>>>> 49013587968.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"}
> >>>>>>>>>>>>>>>>>>>>>> 76834406400.0 --> this one has survived
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"}
> >>>>>>>>>>>>>>>>>>>>>> 63726157824.0
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"}
> >>>>>>>>>>>>>>>>>>>>>> 65217232896.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"}
> >>>>>>>>>>>>>>>>>>>>>> 48944381952.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"}
> >>>>>>>>>>>>>>>>>>>>>> 68093476864.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"}
> >>>>>>>>>>>>>>>>>>>>>> 63632834560.0
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.0"}
> >>>>>>>>>>>>>>>>>>>>>> 471305551872.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.1"}
> >>>>>>>>>>>>>>>>>>>>>> 471305551872.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.2"}
> >>>>>>>>>>>>>>>>>>>>>> 471305551872.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.3"}
> >>>>>>>>>>>>>>>>>>>>>> 471305551872.0
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.0"}
> >>>>>>>>>>>>>>>>>>>>>> 222328213504.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.1"}
> >>>>>>>>>>>>>>>>>>>>>> 214472544256.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.2"}
> >>>>>>>>>>>>>>>>>>>>>> 163603996672.0
> >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.3"}
> >>>>>>>>>>>>>>>>>>>>>> 212806815744.0
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> First crashed OSD was doing DB compaction, others
> >>>>>>>>>>>>>>>>>>>>>> crashed shortly after during backfilling. Workload
> >>>>>>>>>>>>>>>>>>>>>> was "ceph-data-scan scan_inodes" filling metadata
> >>>>>>>>>>>>>>>>>>>>>> pool located on these OSDs at the rate close to
> >>>>>>>>>>>>>>>>>>>>>> 10k objects/second.
> >>>>>>>>>>>>>>>>>>>>>> Here is the log excerpt of the first crash
> >>>>>>>>>>>>>>>>>>>>>> occurrence:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.762 7fbf16dd6700  0
> >>>>>>>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1)
> >>>>>>>>>>>>>>>>>>>>>> _balance_bluefs_freespace no allocate on
> >>>>>>>>>>>>>>>>>>>>>> 0x80000000 min_alloc_size 0x1000
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb:
> >>>>>>>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166]
> >>>>>>>>>>>>>>>>>>>>>> [default] [JOB 24] Generated table #89741: 106356
> >>>>>>>>>>>>>>>>>>>>>> keys, 68110589 bytes
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb:
> >>>>>>>>>>>>>>>>>>>>>> EVENT_LOG_v1 {"time_micros": 1538353632892744,
> >>>>>>>>>>>>>>>>>>>>>> "cf_name": "default", "job": 24, "event":
> >>>>>>>>>>>>>>>>>>>>>> "table_file_creation", "file_number": 89741,
> >>>>>>>>>>>>>>>>>>>>>> "file_size": 68110589, "table_properties":
> >>>>>>>>>>>>>>>>>>>>>> {"data_size": 67112903, "index_size": 579319,
> >>>>>>>>>>>>>>>>>>>>>> "filter_size": 417316, "raw_key_size": 6733561,
> >>>>>>>>>>>>>>>>>>>>>> "raw_average_key_size": 63, "raw_value_size":
> >>>>>>>>>>>>>>>>>>>>>> 60994583, "raw_average_value_size": 573,
> >>>>>>>>>>>>>>>>>>>>>> "num_data_blocks": 16336, "num_entries": 106356,
> >>>>>>>>>>>>>>>>>>>>>> "filter_policy_name":
> >>>>>>>>>>>>>>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys":
> >>>>>>>>>>>>>>>>>>>>>> "14444", "kMergeOperands": "0"}}
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb:
> >>>>>>>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166]
> >>>>>>>>>>>>>>>>>>>>>> [default] [JOB 24] Generated table #89742: 23214
> >>>>>>>>>>>>>>>>>>>>>> keys, 16352315 bytes
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb:
> >>>>>>>>>>>>>>>>>>>>>> EVENT_LOG_v1 {"time_micros": 1538353632938670,
> >>>>>>>>>>>>>>>>>>>>>> "cf_name": "default", "job": 24, "event":
> >>>>>>>>>>>>>>>>>>>>>> "table_file_creation", "file_number": 89742,
> >>>>>>>>>>>>>>>>>>>>>> "file_size": 16352315, "table_properties":
> >>>>>>>>>>>>>>>>>>>>>> {"data_size": 16116986, "index_size": 139894,
> >>>>>>>>>>>>>>>>>>>>>> "filter_size": 94386, "raw_key_size": 1470883,
> >>>>>>>>>>>>>>>>>>>>>> "raw_average_key_size": 63, "raw_value_size":
> >>>>>>>>>>>>>>>>>>>>>> 14775006, "raw_average_value_size": 636,
> >>>>>>>>>>>>>>>>>>>>>> "num_data_blocks": 3928, "num_entries": 23214,
> >>>>>>>>>>>>>>>>>>>>>> "filter_policy_name":
> >>>>>>>>>>>>>>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys":
> >>>>>>>>>>>>>>>>>>>>>> "90", "kMergeOperands": "0"}}
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700  1 bluefs
> >>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x4100000 on bdev 1,
> >>>>>>>>>>>>>>>>>>>>>> free 0x1a00000; fallback to bdev 2
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs
> >>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x4100000 on bdev 2, dne
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs
> >>>>>>>>>>>>>>>>>>>>>> _flush_range allocated: 0x0 offset: 0x0 length:
> >>>>>>>>>>>>>>>>>>>>>> 0x40ea9f1
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.046 7fbf1e5e5700 -1
> >>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In
> >>>>>>>>>>>>>>>>>>>>>> function 'int
> >>>>>>>>>>>>>>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*,
> >>>>>>>>>>>>>>>>>>>>>> uint64_t, uint64_t)' thread 7fbf1e5e5700 time
> >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.048298
> >>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc:
> >>>>>>>>>>>>>>>>>>>>>> 1663: FAILED assert(0 == "bluefs enospc")
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>     ceph version 13.2.2
> >>>>>>>>>>>>>>>>>>>>>> (02899bfda814146b021136e9d8e80eba494e1126) mimic
> >>>>>>>>>>>>>>>>>>>>>> (stable)
> >>>>>>>>>>>>>>>>>>>>>>     1: (ceph::__ceph_assert_fail(char const*, char
> >>>>>>>>>>>>>>>>>>>>>> const*, int, char const*)+0x102) [0x7fbf2d4fe5c2]
> >>>>>>>>>>>>>>>>>>>>>>     2: (()+0x26c787) [0x7fbf2d4fe787]
> >>>>>>>>>>>>>>>>>>>>>>     3: (BlueFS::_flush_range(BlueFS::FileWriter*,
> >>>>>>>>>>>>>>>>>>>>>> unsigned long, unsigned long)+0x1ab4)
> >>>>>>>>>>>>>>>>>>>>>> [0x5619325114b4]
> >>>>>>>>>>>>>>>>>>>>>>     4: (BlueRocksWritableFile::Flush()+0x3d)
> >>>>>>>>>>>>>>>>>>>>>> [0x561932527c1d]
> >>>>>>>>>>>>>>>>>>>>>>     5:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Flush()+0x1b9)
> >>>>>>>>>>>>>>>>>>>>>> [0x56193271c399]
> >>>>>>>>>>>>>>>>>>>>>>     6:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Sync(bool)+0x3b)
> >>>>>>>>>>>>>>>>>>>>>> [0x56193271d42b]
> >>>>>>>>>>>>>>>>>>>>>>     7:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
> >>>>>>>>>>>>>>>>>>>>>> const&,
> >>>>>>>>>>>>>>>>>>>>>> rocksdb::CompactionJob::SubcompactionState*,
> >>>>>>>>>>>>>>>>>>>>>> rocksdb::RangeDelAggregator*,
> >>>>>>>>>>>>>>>>>>>>>> CompactionIterationStats*, rocksdb::Slice
> >>>>>>>>>>>>>>>>>>>>>> const*)+0x3db) [0x56193276098b]
> >>>>>>>>>>>>>>>>>>>>>>     8:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d9)
> >>>>>>>>>>>>>>>>>>>>>> [0x561932763da9]
> >>>>>>>>>>>>>>>>>>>>>>     9: (rocksdb::CompactionJob::Run()+0x314)
> >>>>>>>>>>>>>>>>>>>>>> [0x561932765504]
> >>>>>>>>>>>>>>>>>>>>>>     10:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::BackgroundCompaction(bool*,
> >>>>>>>>>>>>>>>>>>>>>> rocksdb::JobContext*, rocksdb::LogBuffer*,
> >>>>>>>>>>>>>>>>>>>>>> rocksdb::DBImpl::PrepickedCompaction*)+0xc54)
> >>>>>>>>>>>>>>>>>>>>>> [0x5619325b5c44]
> >>>>>>>>>>>>>>>>>>>>>>     11:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
> >>>>>>>>>>>>>>>>>>>>>> rocksdb::Env::Priority)+0x397) [0x5619325b8557]
> >>>>>>>>>>>>>>>>>>>>>>     12:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::BGWorkCompaction(void*)+0x97)
> >>>>>>>>>>>>>>>>>>>>>> [0x5619325b8cd7]
> >>>>>>>>>>>>>>>>>>>>>>     13:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned
> >>>>>>>>>>>>>>>>>>>>>> long)+0x266) [0x5619327a5e36]
> >>>>>>>>>>>>>>>>>>>>>>     14:
> >>>>>>>>>>>>>>>>>>>>>> (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x47)
> >>>>>>>>>>>>>>>>>>>>>> [0x5619327a5fb7]
> >>>>>>>>>>>>>>>>>>>>>>     15: (()+0xbe733) [0x7fbf2b500733]
> >>>>>>>>>>>>>>>>>>>>>>     16: (()+0x76db) [0x7fbf2bbf86db]
> >>>>>>>>>>>>>>>>>>>>>>     17: (clone()+0x3f) [0x7fbf2abbc88f]
> >>>>>>>>>>>>>>>>>>>>>>     NOTE: a copy of the executable, or `objdump
> >>>>>>>>>>>>>>>>>>>>>> -rdS <executable>` is needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On 1.10.2018, at 15:01, Igor Fedotov
> >>>>>>>>>>>>>>>>>>>>>>> <ifedotov@xxxxxxx> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Sergey,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> could you please provide more details on your OSDs ?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> What are sizes for DB/block devices?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Do you have any modifications in BlueStore config
> >>>>>>>>>>>>>>>>>>>>>>> settings?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Can you share stats you're referring to?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Igor
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On 10/1/2018 12:29 PM, Sergey Malinin wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>>>>>>> 3 of 4 NVME OSDs crashed at the same time on
> >>>>>>>>>>>>>>>>>>>>>>>> assert(0 == "bluefs enospc") and no longer start.
> >>>>>>>>>>>>>>>>>>>>>>>> Stats collected just before crash show that
> >>>>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes is 100% used. Although
> >>>>>>>>>>>>>>>>>>>>>>>> OSDs have over 50% of free space, it is not
> >>>>>>>>>>>>>>>>>>>>>>>> reallocated for DB usage.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240  1 bluefs
> >>>>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x100000 on bdev 1,
> >>>>>>>>>>>>>>>>>>>>>>>> free 0x0; fallback to bdev 2
> >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs
> >>>>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x100000 on bdev 2,
> >>>>>>>>>>>>>>>>>>>>>>>> dne
> >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs
> >>>>>>>>>>>>>>>>>>>>>>>> _flush_range allocated: 0x0 offset: 0x0 length:
> >>>>>>>>>>>>>>>>>>>>>>>> 0xa8700
> >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.748 7f1d6a04d240 -1
> >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc:
> >>>>>>>>>>>>>>>>>>>>>>>> In function 'int
> >>>>>>>>>>>>>>>>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*,
> >>>>>>>>>>>>>>>>>>>>>>>> uint64_t, uint64_t)' thread 7f1d6a04d240 time
> >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.746800
> >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc:
> >>>>>>>>>>>>>>>>>>>>>>>> 1663: FAILED assert(0 == "bluefs enospc")
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>     ceph version 13.2.2
> >>>>>>>>>>>>>>>>>>>>>>>> (02899bfda814146b021136e9d8e80eba494e1126) mimic
> >>>>>>>>>>>>>>>>>>>>>>>> (stable)
> >>>>>>>>>>>>>>>>>>>>>>>>     1: (ceph::__ceph_assert_fail(char const*,
> >>>>>>>>>>>>>>>>>>>>>>>> char const*, int, char const*)+0x102)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x7f1d6146f5c2]
> >>>>>>>>>>>>>>>>>>>>>>>>     2: (()+0x26c787) [0x7f1d6146f787]
> >>>>>>>>>>>>>>>>>>>>>>>>     3:
> >>>>>>>>>>>>>>>>>>>>>>>> (BlueFS::_flush_range(BlueFS::FileWriter*,
> >>>>>>>>>>>>>>>>>>>>>>>> unsigned long, unsigned long)+0x1ab4)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b22684b4]
> >>>>>>>>>>>>>>>>>>>>>>>>     4: (BlueRocksWritableFile::Flush()+0x3d)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b227ec1d]
> >>>>>>>>>>>>>>>>>>>>>>>>     5:
> >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Flush()+0x1b9)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b2473399]
> >>>>>>>>>>>>>>>>>>>>>>>>     6:
> >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Sync(bool)+0x3b)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b247442b]
> >>>>>>>>>>>>>>>>>>>>>>>>     7:
> >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::BuildTable(std::__cxx11::basic_string<char,
> >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >
> >>>>>>>>>>>>>>>>>>>>>>>> const&, rocksdb::Env*,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::ImmutableCFOptions const&,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::MutableCFOptions const&,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::EnvOptions const&, rock
> >>>>>>>>>>>>>>>>>>>>>>>> sdb::TableCache*, rocksdb::InternalIterator*,
> >>>>>>>>>>>>>>>>>>>>>>>> std::unique_ptr<rocksdb::InternalIterator,
> >>>>>>>>>>>>>>>>>>>>>>>> std::default_delete<rocksdb::InternalIterator>
> >>>>>>>>>>>>>>>>>>>>>>>> >, rocksdb::FileMetaData*,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::InternalKeyComparator const&,
> >>>>>>>>>>>>>>>>>>>>>>>> std::vector<std::unique_ptr<
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::IntTblPropCollectorFactory,
> >>>>>>>>>>>>>>>>>>>>>>>> std::default_delete<rocksdb::IntTblPropCollectorFactory>
> >>>>>>>>>>>>>>>>>>>>>>>> >,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory,
> >>>>>>>>>>>>>>>>>>>>>>>> std::default_delete<rocksdb::IntTblPropCollectorFactory>
> >>>>>>>>>>>>>>>>>>>>>>>> > > > co
> >>>>>>>>>>>>>>>>>>>>>>>> nst*, unsigned int,
> >>>>>>>>>>>>>>>>>>>>>>>> std::__cxx11::basic_string<char,
> >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >
> >>>>>>>>>>>>>>>>>>>>>>>> const&, std::vector<unsigned long,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<unsigned long> >, unsigned long,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::SnapshotChecker*, rocksdb::Compression
> >>>>>>>>>>>>>>>>>>>>>>>> Type, rocksdb::CompressionOptions const&, bool,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::InternalStats*,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::TableFileCreationReason,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::EventLogger*, int,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::Env::IOPriority,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::TableProperties*, int, unsigned long,
> >>>>>>>>>>>>>>>>>>>>>>>> unsigned long, rocksdb
> >>>>>>>>>>>>>>>>>>>>>>>> ::Env::WriteLifeTimeHint)+0x1e24) [0x5586b249ef94]
> >>>>>>>>>>>>>>>>>>>>>>>>     8:
> >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::WriteLevel0TableForRecovery(int,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::ColumnFamilyData*, rocksdb::MemTable*,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::VersionEdit*)+0xcb7) [0x5586b2321457]
> >>>>>>>>>>>>>>>>>>>>>>>>     9:
> >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned
> >>>>>>>>>>>>>>>>>>>>>>>> long, std::allocator<unsigned long> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>> unsigned long*, bool)+0x19de) [0x5586b232373e]
> >>>>>>>>>>>>>>>>>>>>>>>>     10:
> >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescriptor>
> >>>>>>>>>>>>>>>>>>>>>>>> > const&, bool, bool, bool)+0x5d4) [0x5586b23242f4]
> >>>>>>>>>>>>>>>>>>>>>>>>     11:
> >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::Open(rocksdb::DBOptions
> >>>>>>>>>>>>>>>>>>>>>>>> const&, std::__cxx11::basic_string<char,
> >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >
> >>>>>>>>>>>>>>>>>>>>>>>> const&,
> >>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescri
> >>>>>>>>>>>>>>>>>>>>>>>> ptor> > const&,
> >>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyHandle*,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*> >*,
> >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::DB**, bool)+0x68b) [0x5586b232559b]
> >>>>>>>>>>>>>>>>>>>>>>>>     12: (rocksdb::DB::Open(rocksdb::DBOptions
> >>>>>>>>>>>>>>>>>>>>>>>> const&, std::__cxx11::basic_string<char,
> >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >
> >>>>>>>>>>>>>>>>>>>>>>>> const&,
> >>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescriptor
> >>>>>>>>>>>>>>>>>>>>>>>>>> const&,
> >>>>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyHandle*,
> >>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*>
> >>>>>>>>>>>>>>>>>>>>>>>>>> >*, rocksdb::DB**)+0x22) [0x5586b2326e72]
> >>>>>>>>>>>>>>>>>>>>>>>>     13: (RocksDBStore::do_open(std::ostream&,
> >>>>>>>>>>>>>>>>>>>>>>>> bool, std::vector<KeyValueDB::ColumnFamily,
> >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<KeyValueDB::ColumnFamily> >
> >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x170c) [0x5586b220219c]
> >>>>>>>>>>>>>>>>>>>>>>>>     14: (BlueStore::_open_db(bool, bool)+0xd8e)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b218ee1e]
> >>>>>>>>>>>>>>>>>>>>>>>>     15: (BlueStore::_mount(bool, bool)+0x4b7)
> >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b21bf807]
> >>>>>>>>>>>>>>>>>>>>>>>>     16: (OSD::init()+0x295) [0x5586b1d673c5]
> >>>>>>>>>>>>>>>>>>>>>>>>     17: (main()+0x268d) [0x5586b1c554ed]
> >>>>>>>>>>>>>>>>>>>>>>>>     18: (__libc_start_main()+0xe7) [0x7f1d5ea2db97]
> >>>>>>>>>>>>>>>>>>>>>>>>     19: (_start()+0x2a) [0x5586b1d1d7fa]
> >>>>>>>>>>>>>>>>>>>>>>>>     NOTE: a copy of the executable, or `objdump
> >>>>>>>>>>>>>>>>>>>>>>>> -rdS <executable>` is needed to interpret this.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>>>>>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux