On Tue, Oct 2, 2018 at 10:23 AM Alex Litvak <alexander.v.litvak@xxxxxxxxx> wrote: > > Igor, > > Thank you for your reply. So what you are saying there are really no > sensible space requirements for a collocated device? Even if I setup 30 > GB for DB (which I really wouldn't like to do due to a space waste > considerations ) there is a chance that if this space feels up I will be > in the same trouble under some heavy load scenario? We do have good sizing recommendations for a separate block.db partition. Roughly it shouldn't be less than 4% the size of the data device. http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing > > On 10/2/2018 9:15 AM, Igor Fedotov wrote: > > Even with a single device bluestore has a sort of implicit "BlueFS > > partition" where DB is stored. And it dynamically adjusts (rebalances) > > the space for that partition in background. Unfortunately it might > > perform that "too lazy" and hence under some heavy load it might end-up > > with the lack of space for that partition. While main device still has > > plenty of free space. > > > > I'm planning to refactor this re-balancing procedure in the future to > > eliminate the root cause. > > > > > > Thanks, > > > > Igor > > > > > > On 10/2/2018 5:04 PM, Alex Litvak wrote: > >> I am sorry for interrupting the thread, but my understanding always > >> was that blue store on the single device should not care of the DB > >> size, i.e. it would use the data part for all operations if DB is > >> full. And if it is not true, what would be sensible defaults on 800 > >> GB SSD? I used ceph-ansible to build my cluster with system defaults > >> and from I reading in this thread doesn't give me a good feeling at > >> all. Document ion on the topic is very sketchy and online posts > >> contradict each other some times. > >> > >> Thank you in advance, > >> > >> On 10/2/2018 8:52 AM, Igor Fedotov wrote: > >>> May I have a repair log for that "already expanded" OSD? > >>> > >>> > >>> On 10/2/2018 4:32 PM, Sergey Malinin wrote: > >>>> Repair goes through only when LVM volume has been expanded, > >>>> otherwise it fails with enospc as well as any other operation. > >>>> However, expanding the volume immediately renders bluefs unmountable > >>>> with IO error. > >>>> 2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at > >>>> the very end of bluefs-log-dump), I'm not sure whether corruption > >>>> occurred before or after volume expansion. > >>>> > >>>> > >>>>> On 2.10.2018, at 16:07, Igor Fedotov <ifedotov@xxxxxxx> wrote: > >>>>> > >>>>> You mentioned repair had worked before, is that correct? What's the > >>>>> difference now except the applied patch? Different OSD? Anything else? > >>>>> > >>>>> > >>>>> On 10/2/2018 3:52 PM, Sergey Malinin wrote: > >>>>> > >>>>>> It didn't work, emailed logs to you. > >>>>>> > >>>>>> > >>>>>>> On 2.10.2018, at 14:43, Igor Fedotov <ifedotov@xxxxxxx> wrote: > >>>>>>> > >>>>>>> The major change is in get_bluefs_rebalance_txn function, it > >>>>>>> lacked bluefs_rebalance_txn assignment.. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 10/2/2018 2:40 PM, Sergey Malinin wrote: > >>>>>>>> PR doesn't seem to have changed since yesterday. Am I missing > >>>>>>>> something? > >>>>>>>> > >>>>>>>> > >>>>>>>>> On 2.10.2018, at 14:15, Igor Fedotov <ifedotov@xxxxxxx> wrote: > >>>>>>>>> > >>>>>>>>> Please update the patch from the PR - it didn't update bluefs > >>>>>>>>> extents list before. > >>>>>>>>> > >>>>>>>>> Also please set debug bluestore 20 when re-running repair and > >>>>>>>>> collect the log. > >>>>>>>>> > >>>>>>>>> If repair doesn't help - would you send repair and startup logs > >>>>>>>>> directly to me as I have some issues accessing ceph-post-file > >>>>>>>>> uploads. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> Igor > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 10/2/2018 11:39 AM, Sergey Malinin wrote: > >>>>>>>>>> Yes, I did repair all OSDs and it finished with 'repair > >>>>>>>>>> success'. I backed up OSDs so now I have more room to play. > >>>>>>>>>> I posted log files using ceph-post-file with the following IDs: > >>>>>>>>>> 4af9cc4d-9c73-41c9-9c38-eb6c551047a0 > >>>>>>>>>> 20df7df5-f0c9-4186-aa21-4e5c0172cd93 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On 2.10.2018, at 11:26, Igor Fedotov <ifedotov@xxxxxxx> wrote: > >>>>>>>>>>> > >>>>>>>>>>> You did repair for any of this OSDs, didn't you? For all of > >>>>>>>>>>> them? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Would you please provide the log for both types (failed on > >>>>>>>>>>> mount and failed with enospc) of failing OSDs. Prior to > >>>>>>>>>>> collecting please remove existing ones prior and set debug > >>>>>>>>>>> bluestore to 20. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 10/2/2018 2:16 AM, Sergey Malinin wrote: > >>>>>>>>>>>> I was able to apply patches to mimic, but nothing changed. > >>>>>>>>>>>> One osd that I had space expanded on fails with bluefs mount > >>>>>>>>>>>> IO error, others keep failing with enospc. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> On 1.10.2018, at 19:26, Igor Fedotov <ifedotov@xxxxxxx> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> So you should call repair which rebalances (i.e. allocates > >>>>>>>>>>>>> additional space) BlueFS space. Hence allowing OSD to start. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Igor > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 10/1/2018 7:22 PM, Igor Fedotov wrote: > >>>>>>>>>>>>>> Not exactly. The rebalancing from this kv_sync_thread > >>>>>>>>>>>>>> still might be deferred due to the nature of this thread > >>>>>>>>>>>>>> (haven't 100% sure though). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Here is my PR showing the idea (still untested and perhaps > >>>>>>>>>>>>>> unfinished!!!) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> https://github.com/ceph/ceph/pull/24353 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Igor > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 10/1/2018 7:07 PM, Sergey Malinin wrote: > >>>>>>>>>>>>>>> Can you please confirm whether I got this right: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --- BlueStore.cc.bak 2018-10-01 18:54:45.096836419 +0300 > >>>>>>>>>>>>>>> +++ BlueStore.cc 2018-10-01 19:01:35.937623861 +0300 > >>>>>>>>>>>>>>> @@ -9049,22 +9049,17 @@ > >>>>>>>>>>>>>>> throttle_bytes.put(costs); > >>>>>>>>>>>>>>> PExtentVector bluefs_gift_extents; > >>>>>>>>>>>>>>> - if (bluefs && > >>>>>>>>>>>>>>> - after_flush - bluefs_last_balance > > >>>>>>>>>>>>>>> - cct->_conf->bluestore_bluefs_balance_interval) { > >>>>>>>>>>>>>>> - bluefs_last_balance = after_flush; > >>>>>>>>>>>>>>> - int r = > >>>>>>>>>>>>>>> _balance_bluefs_freespace(&bluefs_gift_extents); > >>>>>>>>>>>>>>> - assert(r >= 0); > >>>>>>>>>>>>>>> - if (r > 0) { > >>>>>>>>>>>>>>> - for (auto& p : bluefs_gift_extents) { > >>>>>>>>>>>>>>> - bluefs_extents.insert(p.offset, p.length); > >>>>>>>>>>>>>>> - } > >>>>>>>>>>>>>>> - bufferlist bl; > >>>>>>>>>>>>>>> - encode(bluefs_extents, bl); > >>>>>>>>>>>>>>> - dout(10) << __func__ << " bluefs_extents now 0x" > >>>>>>>>>>>>>>> << std::hex > >>>>>>>>>>>>>>> - << bluefs_extents << std::dec << dendl; > >>>>>>>>>>>>>>> - synct->set(PREFIX_SUPER, "bluefs_extents", bl); > >>>>>>>>>>>>>>> + int r = > >>>>>>>>>>>>>>> _balance_bluefs_freespace(&bluefs_gift_extents); > >>>>>>>>>>>>>>> + ceph_assert(r >= 0); > >>>>>>>>>>>>>>> + if (r > 0) { > >>>>>>>>>>>>>>> + for (auto& p : bluefs_gift_extents) { > >>>>>>>>>>>>>>> + bluefs_extents.insert(p.offset, p.length); > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> + bufferlist bl; > >>>>>>>>>>>>>>> + encode(bluefs_extents, bl); > >>>>>>>>>>>>>>> + dout(10) << __func__ << " bluefs_extents now 0x" << > >>>>>>>>>>>>>>> std::hex > >>>>>>>>>>>>>>> + << bluefs_extents << std::dec << dendl; > >>>>>>>>>>>>>>> + synct->set(PREFIX_SUPER, "bluefs_extents", bl); > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> // cleanup sync deferred keys > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 1.10.2018, at 18:39, Igor Fedotov <ifedotov@xxxxxxx> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So you have just a single main device per OSD.... > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Then bluestore-tool wouldn't help, it's unable to expand > >>>>>>>>>>>>>>>> BlueFS partition at main device, standalone devices are > >>>>>>>>>>>>>>>> supported only. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Given that you're able to rebuild the code I can suggest > >>>>>>>>>>>>>>>> to make a patch that triggers BlueFS rebalance (see code > >>>>>>>>>>>>>>>> snippet below) on repairing. > >>>>>>>>>>>>>>>> PExtentVector bluefs_gift_extents; > >>>>>>>>>>>>>>>> int r = > >>>>>>>>>>>>>>>> _balance_bluefs_freespace(&bluefs_gift_extents); > >>>>>>>>>>>>>>>> ceph_assert(r >= 0); > >>>>>>>>>>>>>>>> if (r > 0) { > >>>>>>>>>>>>>>>> for (auto& p : bluefs_gift_extents) { > >>>>>>>>>>>>>>>> bluefs_extents.insert(p.offset, p.length); > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> bufferlist bl; > >>>>>>>>>>>>>>>> encode(bluefs_extents, bl); > >>>>>>>>>>>>>>>> dout(10) << __func__ << " bluefs_extents now 0x" > >>>>>>>>>>>>>>>> << std::hex > >>>>>>>>>>>>>>>> << bluefs_extents << std::dec << dendl; > >>>>>>>>>>>>>>>> synct->set(PREFIX_SUPER, "bluefs_extents", bl); > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> If it waits I can probably make a corresponding PR > >>>>>>>>>>>>>>>> tomorrow. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>> Igor > >>>>>>>>>>>>>>>> On 10/1/2018 6:16 PM, Sergey Malinin wrote: > >>>>>>>>>>>>>>>>> I have rebuilt the tool, but none of my OSDs no matter > >>>>>>>>>>>>>>>>> dead or alive have any symlinks other than 'block' > >>>>>>>>>>>>>>>>> pointing to LVM. > >>>>>>>>>>>>>>>>> I adjusted main device size but it looks like it needs > >>>>>>>>>>>>>>>>> even more space for db compaction. After executing > >>>>>>>>>>>>>>>>> bluefs-bdev-expand OSD fails to start, however 'fsck' > >>>>>>>>>>>>>>>>> and 'repair' commands finished successfully. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:39.755 7fc9226c6240 1 freelist init > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:39.763 7fc9226c6240 1 > >>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc opening > >>>>>>>>>>>>>>>>> allocation metadata > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.907 7fc9226c6240 1 > >>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded > >>>>>>>>>>>>>>>>> 285 GiB in 2249899 extents > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.951 7fc9226c6240 -1 > >>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) > >>>>>>>>>>>>>>>>> _reconcile_bluefs_freespace bluefs extra > >>>>>>>>>>>>>>>>> 0x[6d6f000000~50c800000] > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.951 7fc9226c6240 1 stupidalloc > >>>>>>>>>>>>>>>>> 0x0x55d053fb9180 shutdown > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.963 7fc9226c6240 1 freelist shutdown > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.963 7fc9226c6240 4 rocksdb: > >>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] > >>>>>>>>>>>>>>>>> Shutdown: canceling all background work > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.967 7fc9226c6240 4 rocksdb: > >>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] > >>>>>>>>>>>>>>>>> Shutdown complete > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.971 7fc9226c6240 1 bluefs umount > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.975 7fc9226c6240 1 stupidalloc > >>>>>>>>>>>>>>>>> 0x0x55d053883800 shutdown > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:40.975 7fc9226c6240 1 > >>>>>>>>>>>>>>>>> bdev(0x55d053c32e00 /var/lib/ceph/osd/ceph-1/block) close > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:41.267 7fc9226c6240 1 > >>>>>>>>>>>>>>>>> bdev(0x55d053c32a80 /var/lib/ceph/osd/ceph-1/block) close > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 > >>>>>>>>>>>>>>>>> OSD:init: unable to mount object store > >>>>>>>>>>>>>>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1 ** ERROR: osd > >>>>>>>>>>>>>>>>> init failed: (5) Input/output error > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 1.10.2018, at 18:09, Igor Fedotov > >>>>>>>>>>>>>>>>>> <ifedotov@xxxxxxx> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Well, actually you can avoid bluestore-tool rebuild. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> You'll need to edit the first chunk of blocks.db where > >>>>>>>>>>>>>>>>>> labels are stored. (Please make a backup first!!!) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Size label is stored at offset 0x52 and is 8 bytes > >>>>>>>>>>>>>>>>>> long - little-endian 64bit integer encoding. (Please > >>>>>>>>>>>>>>>>>> verify that old value at this offset exactly > >>>>>>>>>>>>>>>>>> corresponds to you original volume size and/or 'size' > >>>>>>>>>>>>>>>>>> label reported by ceph-bluestore-tool). > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> So you have to put new DB volume size there. Or you > >>>>>>>>>>>>>>>>>> can send the first 4K chunk (e.g. extracted with dd) > >>>>>>>>>>>>>>>>>> along with new DB volume size (in bytes) to me and > >>>>>>>>>>>>>>>>>> I'll do that for you. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Igor > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 10/1/2018 5:32 PM, Igor Fedotov wrote: > >>>>>>>>>>>>>>>>>>> On 10/1/2018 5:03 PM, Sergey Malinin wrote: > >>>>>>>>>>>>>>>>>>>> Before I received your response, I had already added > >>>>>>>>>>>>>>>>>>>> 20GB to the OSD (by epanding LV followed by > >>>>>>>>>>>>>>>>>>>> bluefs-bdev-expand) and ran "ceph-kvstore-tool > >>>>>>>>>>>>>>>>>>>> bluestore-kv <path> compact", however it still needs > >>>>>>>>>>>>>>>>>>>> more space. > >>>>>>>>>>>>>>>>>>>> Is that because I didn't update DB size with > >>>>>>>>>>>>>>>>>>>> set-label-key? > >>>>>>>>>>>>>>>>>>> In mimic you need to run both "bluefs-bdev-expand" > >>>>>>>>>>>>>>>>>>> and "set-label-key" command to commit bluefs volume > >>>>>>>>>>>>>>>>>>> expansion. > >>>>>>>>>>>>>>>>>>> Unfortunately the last command doesn't handle "size" > >>>>>>>>>>>>>>>>>>> label properly. That's why you might need to backport > >>>>>>>>>>>>>>>>>>> and rebuild with the mentioned commits. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> What exactly is the label-key that needs to be > >>>>>>>>>>>>>>>>>>>> updated, as I couldn't find which one is related to DB: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> # ceph-bluestore-tool show-label --path > >>>>>>>>>>>>>>>>>>>> /var/lib/ceph/osd/ceph-1 > >>>>>>>>>>>>>>>>>>>> inferring bluefs devices from bluestore path > >>>>>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>>>>> "/var/lib/ceph/osd/ceph-1/block": { > >>>>>>>>>>>>>>>>>>>> "osd_uuid": > >>>>>>>>>>>>>>>>>>>> "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc", > >>>>>>>>>>>>>>>>>>>> "size": 471305551872, > >>>>>>>>>>>>>>>>>>>> "btime": "2018-07-31 03:06:43.751243", > >>>>>>>>>>>>>>>>>>>> "description": "main", > >>>>>>>>>>>>>>>>>>>> "bluefs": "1", > >>>>>>>>>>>>>>>>>>>> "ceph_fsid": > >>>>>>>>>>>>>>>>>>>> "7d320499-5b3f-453e-831f-60d4db9a4533", > >>>>>>>>>>>>>>>>>>>> "kv_backend": "rocksdb", > >>>>>>>>>>>>>>>>>>>> "magic": "ceph osd volume v026", > >>>>>>>>>>>>>>>>>>>> "mkfs_done": "yes", > >>>>>>>>>>>>>>>>>>>> "osd_key": "XXX", > >>>>>>>>>>>>>>>>>>>> "ready": "ready", > >>>>>>>>>>>>>>>>>>>> "whoami": "1" > >>>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>> 'size' label but your output is for block(aka slow) > >>>>>>>>>>>>>>>>>>> device. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> It should return labels for db/wal devices as well > >>>>>>>>>>>>>>>>>>> (block.db and block.wal symlinks respectively). It > >>>>>>>>>>>>>>>>>>> works for me in master, can't verify with mimic at > >>>>>>>>>>>>>>>>>>> the moment though.. > >>>>>>>>>>>>>>>>>>> Here is output for master: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> # bin/ceph-bluestore-tool show-label --path dev/osd0 > >>>>>>>>>>>>>>>>>>> inferring bluefs devices from bluestore path > >>>>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>>>> "dev/osd0/block": { > >>>>>>>>>>>>>>>>>>> "osd_uuid": > >>>>>>>>>>>>>>>>>>> "404dcbe9-3f8d-4ef5-ac59-2582454a9a75", > >>>>>>>>>>>>>>>>>>> "size": 21474836480, > >>>>>>>>>>>>>>>>>>> "btime": "2018-09-10 15:55:09.044039", > >>>>>>>>>>>>>>>>>>> "description": "main", > >>>>>>>>>>>>>>>>>>> "bluefs": "1", > >>>>>>>>>>>>>>>>>>> "ceph_fsid": > >>>>>>>>>>>>>>>>>>> "56eddc15-11b9-4e0b-9192-e391fbae551c", > >>>>>>>>>>>>>>>>>>> "kv_backend": "rocksdb", > >>>>>>>>>>>>>>>>>>> "magic": "ceph osd volume v026", > >>>>>>>>>>>>>>>>>>> "mkfs_done": "yes", > >>>>>>>>>>>>>>>>>>> "osd_key": > >>>>>>>>>>>>>>>>>>> "AQCsaZZbYTxXJBAAe3jJI4p6WbMjvA8CBBUJbA==", > >>>>>>>>>>>>>>>>>>> "ready": "ready", > >>>>>>>>>>>>>>>>>>> "whoami": "0" > >>>>>>>>>>>>>>>>>>> }, > >>>>>>>>>>>>>>>>>>> "dev/osd0/block.wal": { > >>>>>>>>>>>>>>>>>>> "osd_uuid": > >>>>>>>>>>>>>>>>>>> "404dcbe9-3f8d-4ef5-ac59-2582454a9a75", > >>>>>>>>>>>>>>>>>>> "size": 1048576000, > >>>>>>>>>>>>>>>>>>> "btime": "2018-09-10 15:55:09.044985", > >>>>>>>>>>>>>>>>>>> "description": "bluefs wal" > >>>>>>>>>>>>>>>>>>> }, > >>>>>>>>>>>>>>>>>>> "dev/osd0/block.db": { > >>>>>>>>>>>>>>>>>>> "osd_uuid": > >>>>>>>>>>>>>>>>>>> "404dcbe9-3f8d-4ef5-ac59-2582454a9a75", > >>>>>>>>>>>>>>>>>>> "size": 1048576000, > >>>>>>>>>>>>>>>>>>> "btime": "2018-09-10 15:55:09.044469", > >>>>>>>>>>>>>>>>>>> "description": "bluefs db" > >>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> You can try --dev option instead of --path, e.g. > >>>>>>>>>>>>>>>>>>> ceph-bluestore-tool show-label --dev <path-to-block.db> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On 1.10.2018, at 16:48, Igor Fedotov > >>>>>>>>>>>>>>>>>>>>> <ifedotov@xxxxxxx> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> This looks like a sort of deadlock when BlueFS > >>>>>>>>>>>>>>>>>>>>> needs some additional space to replay the log left > >>>>>>>>>>>>>>>>>>>>> after the crash. Which happens during BlueFS open. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> But such a space (at slow device as DB is full) is > >>>>>>>>>>>>>>>>>>>>> gifted in background during bluefs rebalance > >>>>>>>>>>>>>>>>>>>>> procedure which will occur after the open. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hence OSDs stuck in permanent crashing.. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> The only way to recover I can suggest for now is to > >>>>>>>>>>>>>>>>>>>>> expand DB volumes. You can do that with lvm tools > >>>>>>>>>>>>>>>>>>>>> if you have any spare space for that. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Once resized you'll need ceph-bluestore-tool to > >>>>>>>>>>>>>>>>>>>>> indicate volume expansion to BlueFS > >>>>>>>>>>>>>>>>>>>>> (bluefs-bdev-expand command ) and finally update DB > >>>>>>>>>>>>>>>>>>>>> volume size label with set-label-key command. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> The latter is a bit tricky for mimic - you might > >>>>>>>>>>>>>>>>>>>>> need to backport > >>>>>>>>>>>>>>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> and rebuild ceph-bluestore-tool. Alternatively you > >>>>>>>>>>>>>>>>>>>>> can backport > >>>>>>>>>>>>>>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> then bluefs expansion and label updates will occur > >>>>>>>>>>>>>>>>>>>>> in a single step. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I'll do these backports in upstream but this will > >>>>>>>>>>>>>>>>>>>>> take some time to pass all the procedures and get > >>>>>>>>>>>>>>>>>>>>> into official mimic release. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Will fire a ticket to fix the original issue as well. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Igor > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On 10/1/2018 3:28 PM, Sergey Malinin wrote: > >>>>>>>>>>>>>>>>>>>>>> These are LVM bluestore NVMe SSDs created with > >>>>>>>>>>>>>>>>>>>>>> "ceph-volume --lvm prepare --bluestore > >>>>>>>>>>>>>>>>>>>>>> /dev/nvme0n1p3" i.e. without specifying wal/db > >>>>>>>>>>>>>>>>>>>>>> devices. > >>>>>>>>>>>>>>>>>>>>>> OSDs were created with > >>>>>>>>>>>>>>>>>>>>>> bluestore_min_alloc_size_ssd=4096, another > >>>>>>>>>>>>>>>>>>>>>> modified setting is bluestore_cache_kv_max=1073741824 > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> DB/block usage collected by prometheus module for > >>>>>>>>>>>>>>>>>>>>>> 3 failed and 1 survived OSDs: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} > >>>>>>>>>>>>>>>>>>>>>> 65493008384.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} > >>>>>>>>>>>>>>>>>>>>>> 49013587968.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} > >>>>>>>>>>>>>>>>>>>>>> 76834406400.0 --> this one has survived > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} > >>>>>>>>>>>>>>>>>>>>>> 63726157824.0 > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} > >>>>>>>>>>>>>>>>>>>>>> 65217232896.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} > >>>>>>>>>>>>>>>>>>>>>> 48944381952.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} > >>>>>>>>>>>>>>>>>>>>>> 68093476864.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} > >>>>>>>>>>>>>>>>>>>>>> 63632834560.0 > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.0"} > >>>>>>>>>>>>>>>>>>>>>> 471305551872.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.1"} > >>>>>>>>>>>>>>>>>>>>>> 471305551872.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.2"} > >>>>>>>>>>>>>>>>>>>>>> 471305551872.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.3"} > >>>>>>>>>>>>>>>>>>>>>> 471305551872.0 > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.0"} > >>>>>>>>>>>>>>>>>>>>>> 222328213504.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.1"} > >>>>>>>>>>>>>>>>>>>>>> 214472544256.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.2"} > >>>>>>>>>>>>>>>>>>>>>> 163603996672.0 > >>>>>>>>>>>>>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.3"} > >>>>>>>>>>>>>>>>>>>>>> 212806815744.0 > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> First crashed OSD was doing DB compaction, others > >>>>>>>>>>>>>>>>>>>>>> crashed shortly after during backfilling. Workload > >>>>>>>>>>>>>>>>>>>>>> was "ceph-data-scan scan_inodes" filling metadata > >>>>>>>>>>>>>>>>>>>>>> pool located on these OSDs at the rate close to > >>>>>>>>>>>>>>>>>>>>>> 10k objects/second. > >>>>>>>>>>>>>>>>>>>>>> Here is the log excerpt of the first crash > >>>>>>>>>>>>>>>>>>>>>> occurrence: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.762 7fbf16dd6700 0 > >>>>>>>>>>>>>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) > >>>>>>>>>>>>>>>>>>>>>> _balance_bluefs_freespace no allocate on > >>>>>>>>>>>>>>>>>>>>>> 0x80000000 min_alloc_size 0x1000 > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700 4 rocksdb: > >>>>>>>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] > >>>>>>>>>>>>>>>>>>>>>> [default] [JOB 24] Generated table #89741: 106356 > >>>>>>>>>>>>>>>>>>>>>> keys, 68110589 bytes > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700 4 rocksdb: > >>>>>>>>>>>>>>>>>>>>>> EVENT_LOG_v1 {"time_micros": 1538353632892744, > >>>>>>>>>>>>>>>>>>>>>> "cf_name": "default", "job": 24, "event": > >>>>>>>>>>>>>>>>>>>>>> "table_file_creation", "file_number": 89741, > >>>>>>>>>>>>>>>>>>>>>> "file_size": 68110589, "table_properties": > >>>>>>>>>>>>>>>>>>>>>> {"data_size": 67112903, "index_size": 579319, > >>>>>>>>>>>>>>>>>>>>>> "filter_size": 417316, "raw_key_size": 6733561, > >>>>>>>>>>>>>>>>>>>>>> "raw_average_key_size": 63, "raw_value_size": > >>>>>>>>>>>>>>>>>>>>>> 60994583, "raw_average_value_size": 573, > >>>>>>>>>>>>>>>>>>>>>> "num_data_blocks": 16336, "num_entries": 106356, > >>>>>>>>>>>>>>>>>>>>>> "filter_policy_name": > >>>>>>>>>>>>>>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys": > >>>>>>>>>>>>>>>>>>>>>> "14444", "kMergeOperands": "0"}} > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700 4 rocksdb: > >>>>>>>>>>>>>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] > >>>>>>>>>>>>>>>>>>>>>> [default] [JOB 24] Generated table #89742: 23214 > >>>>>>>>>>>>>>>>>>>>>> keys, 16352315 bytes > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700 4 rocksdb: > >>>>>>>>>>>>>>>>>>>>>> EVENT_LOG_v1 {"time_micros": 1538353632938670, > >>>>>>>>>>>>>>>>>>>>>> "cf_name": "default", "job": 24, "event": > >>>>>>>>>>>>>>>>>>>>>> "table_file_creation", "file_number": 89742, > >>>>>>>>>>>>>>>>>>>>>> "file_size": 16352315, "table_properties": > >>>>>>>>>>>>>>>>>>>>>> {"data_size": 16116986, "index_size": 139894, > >>>>>>>>>>>>>>>>>>>>>> "filter_size": 94386, "raw_key_size": 1470883, > >>>>>>>>>>>>>>>>>>>>>> "raw_average_key_size": 63, "raw_value_size": > >>>>>>>>>>>>>>>>>>>>>> 14775006, "raw_average_value_size": 636, > >>>>>>>>>>>>>>>>>>>>>> "num_data_blocks": 3928, "num_entries": 23214, > >>>>>>>>>>>>>>>>>>>>>> "filter_policy_name": > >>>>>>>>>>>>>>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys": > >>>>>>>>>>>>>>>>>>>>>> "90", "kMergeOperands": "0"}} > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 1 bluefs > >>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x4100000 on bdev 1, > >>>>>>>>>>>>>>>>>>>>>> free 0x1a00000; fallback to bdev 2 > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs > >>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x4100000 on bdev 2, dne > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs > >>>>>>>>>>>>>>>>>>>>>> _flush_range allocated: 0x0 offset: 0x0 length: > >>>>>>>>>>>>>>>>>>>>>> 0x40ea9f1 > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.046 7fbf1e5e5700 -1 > >>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In > >>>>>>>>>>>>>>>>>>>>>> function 'int > >>>>>>>>>>>>>>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*, > >>>>>>>>>>>>>>>>>>>>>> uint64_t, uint64_t)' thread 7fbf1e5e5700 time > >>>>>>>>>>>>>>>>>>>>>> 2018-10-01 03:27:13.048298 > >>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: > >>>>>>>>>>>>>>>>>>>>>> 1663: FAILED assert(0 == "bluefs enospc") > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> ceph version 13.2.2 > >>>>>>>>>>>>>>>>>>>>>> (02899bfda814146b021136e9d8e80eba494e1126) mimic > >>>>>>>>>>>>>>>>>>>>>> (stable) > >>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char > >>>>>>>>>>>>>>>>>>>>>> const*, int, char const*)+0x102) [0x7fbf2d4fe5c2] > >>>>>>>>>>>>>>>>>>>>>> 2: (()+0x26c787) [0x7fbf2d4fe787] > >>>>>>>>>>>>>>>>>>>>>> 3: (BlueFS::_flush_range(BlueFS::FileWriter*, > >>>>>>>>>>>>>>>>>>>>>> unsigned long, unsigned long)+0x1ab4) > >>>>>>>>>>>>>>>>>>>>>> [0x5619325114b4] > >>>>>>>>>>>>>>>>>>>>>> 4: (BlueRocksWritableFile::Flush()+0x3d) > >>>>>>>>>>>>>>>>>>>>>> [0x561932527c1d] > >>>>>>>>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Flush()+0x1b9) > >>>>>>>>>>>>>>>>>>>>>> [0x56193271c399] > >>>>>>>>>>>>>>>>>>>>>> 6: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Sync(bool)+0x3b) > >>>>>>>>>>>>>>>>>>>>>> [0x56193271d42b] > >>>>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status > >>>>>>>>>>>>>>>>>>>>>> const&, > >>>>>>>>>>>>>>>>>>>>>> rocksdb::CompactionJob::SubcompactionState*, > >>>>>>>>>>>>>>>>>>>>>> rocksdb::RangeDelAggregator*, > >>>>>>>>>>>>>>>>>>>>>> CompactionIterationStats*, rocksdb::Slice > >>>>>>>>>>>>>>>>>>>>>> const*)+0x3db) [0x56193276098b] > >>>>>>>>>>>>>>>>>>>>>> 8: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d9) > >>>>>>>>>>>>>>>>>>>>>> [0x561932763da9] > >>>>>>>>>>>>>>>>>>>>>> 9: (rocksdb::CompactionJob::Run()+0x314) > >>>>>>>>>>>>>>>>>>>>>> [0x561932765504] > >>>>>>>>>>>>>>>>>>>>>> 10: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::BackgroundCompaction(bool*, > >>>>>>>>>>>>>>>>>>>>>> rocksdb::JobContext*, rocksdb::LogBuffer*, > >>>>>>>>>>>>>>>>>>>>>> rocksdb::DBImpl::PrepickedCompaction*)+0xc54) > >>>>>>>>>>>>>>>>>>>>>> [0x5619325b5c44] > >>>>>>>>>>>>>>>>>>>>>> 11: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, > >>>>>>>>>>>>>>>>>>>>>> rocksdb::Env::Priority)+0x397) [0x5619325b8557] > >>>>>>>>>>>>>>>>>>>>>> 12: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::BGWorkCompaction(void*)+0x97) > >>>>>>>>>>>>>>>>>>>>>> [0x5619325b8cd7] > >>>>>>>>>>>>>>>>>>>>>> 13: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned > >>>>>>>>>>>>>>>>>>>>>> long)+0x266) [0x5619327a5e36] > >>>>>>>>>>>>>>>>>>>>>> 14: > >>>>>>>>>>>>>>>>>>>>>> (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x47) > >>>>>>>>>>>>>>>>>>>>>> [0x5619327a5fb7] > >>>>>>>>>>>>>>>>>>>>>> 15: (()+0xbe733) [0x7fbf2b500733] > >>>>>>>>>>>>>>>>>>>>>> 16: (()+0x76db) [0x7fbf2bbf86db] > >>>>>>>>>>>>>>>>>>>>>> 17: (clone()+0x3f) [0x7fbf2abbc88f] > >>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump > >>>>>>>>>>>>>>>>>>>>>> -rdS <executable>` is needed to interpret this. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On 1.10.2018, at 15:01, Igor Fedotov > >>>>>>>>>>>>>>>>>>>>>>> <ifedotov@xxxxxxx> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Hi Sergey, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> could you please provide more details on your OSDs ? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> What are sizes for DB/block devices? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Do you have any modifications in BlueStore config > >>>>>>>>>>>>>>>>>>>>>>> settings? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Can you share stats you're referring to? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Igor > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On 10/1/2018 12:29 PM, Sergey Malinin wrote: > >>>>>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>>>>> 3 of 4 NVME OSDs crashed at the same time on > >>>>>>>>>>>>>>>>>>>>>>>> assert(0 == "bluefs enospc") and no longer start. > >>>>>>>>>>>>>>>>>>>>>>>> Stats collected just before crash show that > >>>>>>>>>>>>>>>>>>>>>>>> ceph_bluefs_db_used_bytes is 100% used. Although > >>>>>>>>>>>>>>>>>>>>>>>> OSDs have over 50% of free space, it is not > >>>>>>>>>>>>>>>>>>>>>>>> reallocated for DB usage. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 1 bluefs > >>>>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x100000 on bdev 1, > >>>>>>>>>>>>>>>>>>>>>>>> free 0x0; fallback to bdev 2 > >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs > >>>>>>>>>>>>>>>>>>>>>>>> _allocate failed to allocate 0x100000 on bdev 2, > >>>>>>>>>>>>>>>>>>>>>>>> dne > >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs > >>>>>>>>>>>>>>>>>>>>>>>> _flush_range allocated: 0x0 offset: 0x0 length: > >>>>>>>>>>>>>>>>>>>>>>>> 0xa8700 > >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.748 7f1d6a04d240 -1 > >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: > >>>>>>>>>>>>>>>>>>>>>>>> In function 'int > >>>>>>>>>>>>>>>>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*, > >>>>>>>>>>>>>>>>>>>>>>>> uint64_t, uint64_t)' thread 7f1d6a04d240 time > >>>>>>>>>>>>>>>>>>>>>>>> 2018-10-01 12:18:06.746800 > >>>>>>>>>>>>>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: > >>>>>>>>>>>>>>>>>>>>>>>> 1663: FAILED assert(0 == "bluefs enospc") > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> ceph version 13.2.2 > >>>>>>>>>>>>>>>>>>>>>>>> (02899bfda814146b021136e9d8e80eba494e1126) mimic > >>>>>>>>>>>>>>>>>>>>>>>> (stable) > >>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, > >>>>>>>>>>>>>>>>>>>>>>>> char const*, int, char const*)+0x102) > >>>>>>>>>>>>>>>>>>>>>>>> [0x7f1d6146f5c2] > >>>>>>>>>>>>>>>>>>>>>>>> 2: (()+0x26c787) [0x7f1d6146f787] > >>>>>>>>>>>>>>>>>>>>>>>> 3: > >>>>>>>>>>>>>>>>>>>>>>>> (BlueFS::_flush_range(BlueFS::FileWriter*, > >>>>>>>>>>>>>>>>>>>>>>>> unsigned long, unsigned long)+0x1ab4) > >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b22684b4] > >>>>>>>>>>>>>>>>>>>>>>>> 4: (BlueRocksWritableFile::Flush()+0x3d) > >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b227ec1d] > >>>>>>>>>>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Flush()+0x1b9) > >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b2473399] > >>>>>>>>>>>>>>>>>>>>>>>> 6: > >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::WritableFileWriter::Sync(bool)+0x3b) > >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b247442b] > >>>>>>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::BuildTable(std::__cxx11::basic_string<char, > >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > > >>>>>>>>>>>>>>>>>>>>>>>> const&, rocksdb::Env*, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::ImmutableCFOptions const&, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::MutableCFOptions const&, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::EnvOptions const&, rock > >>>>>>>>>>>>>>>>>>>>>>>> sdb::TableCache*, rocksdb::InternalIterator*, > >>>>>>>>>>>>>>>>>>>>>>>> std::unique_ptr<rocksdb::InternalIterator, > >>>>>>>>>>>>>>>>>>>>>>>> std::default_delete<rocksdb::InternalIterator> > >>>>>>>>>>>>>>>>>>>>>>>> >, rocksdb::FileMetaData*, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::InternalKeyComparator const&, > >>>>>>>>>>>>>>>>>>>>>>>> std::vector<std::unique_ptr< > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::IntTblPropCollectorFactory, > >>>>>>>>>>>>>>>>>>>>>>>> std::default_delete<rocksdb::IntTblPropCollectorFactory> > >>>>>>>>>>>>>>>>>>>>>>>> >, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, > >>>>>>>>>>>>>>>>>>>>>>>> std::default_delete<rocksdb::IntTblPropCollectorFactory> > >>>>>>>>>>>>>>>>>>>>>>>> > > > co > >>>>>>>>>>>>>>>>>>>>>>>> nst*, unsigned int, > >>>>>>>>>>>>>>>>>>>>>>>> std::__cxx11::basic_string<char, > >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > > >>>>>>>>>>>>>>>>>>>>>>>> const&, std::vector<unsigned long, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<unsigned long> >, unsigned long, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::SnapshotChecker*, rocksdb::Compression > >>>>>>>>>>>>>>>>>>>>>>>> Type, rocksdb::CompressionOptions const&, bool, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::InternalStats*, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::TableFileCreationReason, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::EventLogger*, int, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::Env::IOPriority, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::TableProperties*, int, unsigned long, > >>>>>>>>>>>>>>>>>>>>>>>> unsigned long, rocksdb > >>>>>>>>>>>>>>>>>>>>>>>> ::Env::WriteLifeTimeHint)+0x1e24) [0x5586b249ef94] > >>>>>>>>>>>>>>>>>>>>>>>> 8: > >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::ColumnFamilyData*, rocksdb::MemTable*, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::VersionEdit*)+0xcb7) [0x5586b2321457] > >>>>>>>>>>>>>>>>>>>>>>>> 9: > >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned > >>>>>>>>>>>>>>>>>>>>>>>> long, std::allocator<unsigned long> > const&, > >>>>>>>>>>>>>>>>>>>>>>>> unsigned long*, bool)+0x19de) [0x5586b232373e] > >>>>>>>>>>>>>>>>>>>>>>>> 10: > >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescriptor> > >>>>>>>>>>>>>>>>>>>>>>>> > const&, bool, bool, bool)+0x5d4) [0x5586b23242f4] > >>>>>>>>>>>>>>>>>>>>>>>> 11: > >>>>>>>>>>>>>>>>>>>>>>>> (rocksdb::DBImpl::Open(rocksdb::DBOptions > >>>>>>>>>>>>>>>>>>>>>>>> const&, std::__cxx11::basic_string<char, > >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > > >>>>>>>>>>>>>>>>>>>>>>>> const&, > >>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescri > >>>>>>>>>>>>>>>>>>>>>>>> ptor> > const&, > >>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyHandle*, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*> >*, > >>>>>>>>>>>>>>>>>>>>>>>> rocksdb::DB**, bool)+0x68b) [0x5586b232559b] > >>>>>>>>>>>>>>>>>>>>>>>> 12: (rocksdb::DB::Open(rocksdb::DBOptions > >>>>>>>>>>>>>>>>>>>>>>>> const&, std::__cxx11::basic_string<char, > >>>>>>>>>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > > >>>>>>>>>>>>>>>>>>>>>>>> const&, > >>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescriptor > >>>>>>>>>>>>>>>>>>>>>>>>>> const&, > >>>>>>>>>>>>>>>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyHandle*, > >>>>>>>>>>>>>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*> > >>>>>>>>>>>>>>>>>>>>>>>>>> >*, rocksdb::DB**)+0x22) [0x5586b2326e72] > >>>>>>>>>>>>>>>>>>>>>>>> 13: (RocksDBStore::do_open(std::ostream&, > >>>>>>>>>>>>>>>>>>>>>>>> bool, std::vector<KeyValueDB::ColumnFamily, > >>>>>>>>>>>>>>>>>>>>>>>> std::allocator<KeyValueDB::ColumnFamily> > > >>>>>>>>>>>>>>>>>>>>>>>> const*)+0x170c) [0x5586b220219c] > >>>>>>>>>>>>>>>>>>>>>>>> 14: (BlueStore::_open_db(bool, bool)+0xd8e) > >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b218ee1e] > >>>>>>>>>>>>>>>>>>>>>>>> 15: (BlueStore::_mount(bool, bool)+0x4b7) > >>>>>>>>>>>>>>>>>>>>>>>> [0x5586b21bf807] > >>>>>>>>>>>>>>>>>>>>>>>> 16: (OSD::init()+0x295) [0x5586b1d673c5] > >>>>>>>>>>>>>>>>>>>>>>>> 17: (main()+0x268d) [0x5586b1c554ed] > >>>>>>>>>>>>>>>>>>>>>>>> 18: (__libc_start_main()+0xe7) [0x7f1d5ea2db97] > >>>>>>>>>>>>>>>>>>>>>>>> 19: (_start()+0x2a) [0x5586b1d1d7fa] > >>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump > >>>>>>>>>>>>>>>>>>>>>>>> -rdS <executable>` is needed to interpret this. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>>>>>>>>>>> ceph-users mailing list > >>>>>>>>>>>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>>>>>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>>>>>> ceph-users mailing list > >>>>>>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>> ceph-users mailing list > >>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com