Dear Igor Thank you for your help! I am working with Florian. We have built the ceph-bluestore-tool with your patch on SLES 12SP3. We will post back the results ASAP. Best Regards Francois Scheurer
-------- Weitergeleitete Nachricht -------- Betreff: Re: RocksDB and WAL migration to new block device Datum: Wed, 21 Nov 2018 11:34:47 +0300 Von: Igor Fedotov <ifedotov@xxxxxxx>An: Florian Engelmann <florian.engelmann@xxxxxxxxxxxx>, ceph-users@xxxxxxxxxxxxxxActually (given that your devices are already expanded) you don't need to expand them once again - one can just update size labels with my new PR.For new migrations you can use updated bluefs expand command which sets size label automatically though.Thanks, Igor On 11/21/2018 11:11 AM, Florian Engelmann wrote:Great support Igor!!!! Both thumbs up! We will try to build the tool today and expand those bluefs devices once again.Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:FYI: https://github.com/ceph/ceph/pull/25187 On 11/20/2018 8:13 PM, Igor Fedotov wrote:On 11/20/2018 7:05 PM, Florian Engelmann wrote:if you can build a custom version of ceph_bluestore_tool then this is a short path. I'll submit a patch today or tomorrow which you need to integrate into your private build.Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:On 11/20/2018 6:42 PM, Florian Engelmann wrote:Hi Igor,what's your Ceph version?12.2.8 (SES 5.5 - patched to the latest version)Can you also check the output for ceph-bluestore-tool show-label -p <path to osd>ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/ infering bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-0//block": { "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b", "size": 8001457295360, "btime": "2018-06-29 23:43:12.088842", "description": "main", "bluefs": "1", "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "ready": "ready", "whoami": "0" }, "/var/lib/ceph/osd/ceph-0//block.wal": { "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b", "size": 524288000, "btime": "2018-06-29 23:43:12.098690", "description": "bluefs wal" }, "/var/lib/ceph/osd/ceph-0//block.db": { "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b", "size": 524288000, "btime": "2018-06-29 23:43:12.098023", "description": "bluefs db" } }It should report 'size' labels for every volume, please check they contain new values.That's exactly the problem, whether "ceph-bluestore-tool show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new sizes. But we are 100% sure the new devices are used as we already deleted the old once...We tried to delete the "key" "size" to add one with the new value but:ceph-bluestore-tool rm-label-key --dev /var/lib/ceph/osd/ceph-0/block.db -k sizekey 'size' not present even if:ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db{ "/var/lib/ceph/osd/ceph-0/block.db": { "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b", "size": 524288000, "btime": "2018-06-29 23:43:12.098023", "description": "bluefs db" } } So it looks like the key "size" is "read-only"?There was a bug in updating specific keys, see https://github.com/ceph/ceph/pull/24352This PR also eliminates the need to set sizes manually on bdev-expand.I thought it had been backported to Luminous but it looks like it doesn't.Will submit a PR shortly.Thank you so much Igor! So we have to decide how to proceed. Maybe you could help us here as well.Option A: Wait for this fix to be available. -> could last weeks or even monthsThen you need to upgrade just the tool and apply new sizes.Option B: Recreate OSDs "one-by-one". -> will take a very long time as wellNo need for that IMO.Well hex editor might help here as well. What you need is just to update 64bit size value in block.db and block.wal files. In my lab I can find it at offset 0x52. Most probably this is the fixed location but it's better to check beforehand - existing value should contain value corresponding to the one reported with show-label. Or I can do that for you - please send the first 4K chunks to me along with corresponding label report. Then update with new values - the field has to contain exactly the same size as your new partition.Option C: There is some "lowlevel" commad allowing us to fix those sizes?Thanks, Igor On 11/20/2018 5:29 PM, Florian Engelmann wrote:Hi,today we migrated all of our rocksdb and wal devices to new once. The new once are much bigger (500MB for wal/db -> 60GB db and 2G WAL) and LVM based.We migrated like: export OSD=x systemctl stop ceph-osd@$OSD lvcreate -n db-osd$OSD -L60g data || exit 1 lvcreate -n wal-osd$OSD -L2g data || exit 1dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal of=/dev/data/wal-osd$OSD bs=1M || exit 1 dd if=/var/lib/ceph/osd/ceph-$OSD/block.db of=/dev/data/db-osd$OSD bs=1M || exit 1rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1 rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1 ln -vs /dev/data/wal-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1 chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1 chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$OSD/ || exit 1systemctl start ceph-osd@$OSDEverything went fine but it looks like the db and wal size is still the old one:ceph daemon osd.0 perf dump|jq '.bluefs' { "gift_bytes": 0, "reclaim_bytes": 0, "db_total_bytes": 524279808, "db_used_bytes": 330301440, "wal_total_bytes": 524283904, "wal_used_bytes": 69206016, "slow_total_bytes": 320058949632, "slow_used_bytes": 13606322176, "num_files": 220, "log_bytes": 44204032, "log_compactions": 0, "logged_bytes": 31145984, "files_written_wal": 1, "files_written_sst": 1, "bytes_written_wal": 37753489, "bytes_written_sst": 238992 } Even if the new block devices are recognized correctly:2018-11-20 11:40:34.653524 7f70219b8d00 1 bdev(0x5647ea9ce200 /var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 (0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational 2018-11-20 11:40:34.653532 7f70219b8d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB2018-11-20 11:40:34.662385 7f70219b8d00 1 bdev(0x5647ea9ce600 /var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 (0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational 2018-11-20 11:40:34.662406 7f70219b8d00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiBAre we missing some command to "notify" rocksdb about the new device size?All the best, Florian _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41 44 466 60 10 mail: francois.scheurer@xxxxxxxxxxxx web: http://www.everyware.ch
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com