Re: Fwd: Re: RocksDB and WAL migration to new block device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Igor


Thank you for your help!
I am working with Florian.
We have built the ceph-bluestore-tool with your patch on SLES 12SP3.

We will post back the results  ASAP.


Best Regards
Francois Scheurer




-------- Weitergeleitete Nachricht --------
Betreff: Re:  RocksDB and WAL migration to new block device
Datum: Wed, 21 Nov 2018 11:34:47 +0300
Von: Igor Fedotov <ifedotov@xxxxxxx>
An: Florian Engelmann <florian.engelmann@xxxxxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx

Actually  (given that your devices are already expanded) you don't need to expand them once again - one can just update size labels with my new PR.

For new migrations you can use updated bluefs expand command which sets size label automatically though.


Thanks,
Igor
On 11/21/2018 11:11 AM, Florian Engelmann wrote:
Great support Igor!!!! Both thumbs up! We will try to build the tool today and expand those bluefs devices once again.


Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:
FYI: https://github.com/ceph/ceph/pull/25187


On 11/20/2018 8:13 PM, Igor Fedotov wrote:

On 11/20/2018 7:05 PM, Florian Engelmann wrote:
Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:


On 11/20/2018 6:42 PM, Florian Engelmann wrote:
Hi Igor,


what's your Ceph version?

12.2.8 (SES 5.5 - patched to the latest version)


Can you also check the output for

ceph-bluestore-tool show-label -p <path to osd>

ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}




It should report 'size' labels for every volume, please check they contain new values.


That's exactly the problem, whether "ceph-bluestore-tool show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new sizes. But we are 100% sure the new devices are used as we already deleted the old once...

We tried to delete the "key" "size" to add one with the new value but:

ceph-bluestore-tool rm-label-key --dev /var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present

even if:

ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}

So it looks like the key "size" is "read-only"?

There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352

This PR also eliminates the need to set sizes manually on bdev-expand.

I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.



Thank you so much Igor! So we have to decide how to proceed. Maybe you could help us here as well.

Option A: Wait for this fix to be available. -> could last weeks or even months
if you can build a custom version of ceph_bluestore_tool then this is a short path. I'll submit a patch today or tomorrow which you need to integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.


Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.

Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to update 64bit size value in block.db and block.wal files. In my lab I can find it at offset 0x52. Most probably this is the fixed location but it's better to check beforehand - existing value should contain value corresponding to the one reported with show-label. Or I can do that for you - please send the first 4K chunks to me along with corresponding label report. Then update with new values - the field has to contain exactly the same size as your new partition.






Thanks,

Igor


On 11/20/2018 5:29 PM, Florian Engelmann wrote:
Hi,

today we migrated all of our rocksdb and wal devices to new once. The new once are much bigger (500MB for wal/db -> 60GB db and 2G WAL) and LVM based.

We migrated like:

    export OSD=x

    systemctl stop ceph-osd@$OSD

    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1

    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal of=/dev/data/wal-osd$OSD bs=1M || exit 1     dd if=/var/lib/ceph/osd/ceph-$OSD/block.db of=/dev/data/db-osd$OSD bs=1M  || exit 1

    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1     ln -vs /dev/data/wal-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1


    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1


    ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-$OSD/ || exit 1

    systemctl start ceph-osd@$OSD


Everything went fine but it looks like the db and wal size is still the old one:

ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}


Even if the new block devices are recognized correctly:

2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 /var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 (0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational 2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB


2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 /var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 (0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational 2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB


Are we missing some command to "notify" rocksdb about the new device size?

All the best,
Florian


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheurer@xxxxxxxxxxxx
web: http://www.everyware.ch

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux