Re: is it possible to remove the db+wal from an external device (nvme)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This one is in messages: https://justpaste.it/3x08z

Buffered_io is turned on by default in 15.2.14 octopus FYI.


Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: Tuesday, October 5, 2021 9:52 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Cc: 胡 玮文 <huww98@xxxxxxxxxxx>; Igor Fedotov <ifedotov@xxxxxxx>; ceph-users@xxxxxxx
Subject: Re:  Re: is it possible to remove the db+wal from an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Do you see oom killers in dmesg on this host? This line indicates it:

          "(tcmalloc::allocate_full_cpp_throw_oom(unsigned
long)+0x146) [0x7f310b7d8c96]",


Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:

> Hmm, tried another one which hasn’t been spilledover disk, still 
> coredumped ☹ Is there any special thing that we need to do before we 
> migrate db next to the block? Our osds are using dmcrypt, is it an issue?
>
> {
>     "backtrace": [
>         "(()+0x12b20) [0x7f310aa49b20]",
>         "(gsignal()+0x10f) [0x7f31096aa37f]",
>         "(abort()+0x127) [0x7f3109694db5]",
>         "(()+0x9009b) [0x7f310a06209b]",
>         "(()+0x9653c) [0x7f310a06853c]",
>         "(()+0x95559) [0x7f310a067559]",
>         "(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
>         "(()+0x10b03) [0x7f3109a48b03]",
>         "(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
>         "(__cxa_throw()+0x3b) [0x7f310a0687eb]",
>         "(()+0x19fa4) [0x7f310b7b6fa4]",
>         "(tcmalloc::allocate_full_cpp_throw_oom(unsigned
> long)+0x146) [0x7f310b7d8c96]",
>         "(()+0x10d0f8e) [0x55ffa520df8e]",
>         "(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
>         "(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
>         "(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a)
> [0x55ffa52efcca]",
>         "(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88)
> [0x55ffa52f0568]",
>         "(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
>         "(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
>         "(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
>         "(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
>         "(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
>         "(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, 
> std::vector<rocksdb::ColumnFamilyDescriptor,
> std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, 
> std::vector<rocksdb::ColumnFamilyHandle*,
> std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**,
> bool)+0x1089) [0x55ffa51a57e9]",
>         "(RocksDBStore::do_open(std::ostream&, bool, bool, 
> std::vector<KeyValueDB::ColumnFamily,
> std::allocator<KeyValueDB::ColumnFamily> > const*)+0x14ca) 
> [0x55ffa51285ca]",
>         "(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
>         "(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
>         "(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
>         "(OSD::init()+0x380) [0x55ffa4753a70]",
>         "(main()+0x47f1) [0x55ffa46a6901]",
>         "(__libc_start_main()+0xf3) [0x7f3109696493]",
>         "(_start()+0x2e) [0x55ffa46d4e3e]"
>     ],
>     "ceph_version": "15.2.14",
>     "crash_id":
> "2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",
>     "entity_name": "osd.48",
>     "os_id": "centos",
>     "os_name": "CentOS Linux",
>     "os_version": "8",
>     "os_version_id": "8",
>     "process_name": "ceph-osd",
>     "stack_sig":
> "6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",
>     "timestamp": "2021-10-05T13:31:28.513463Z",
>     "utsname_hostname": "server-2s07",
>     "utsname_machine": "x86_64",
>     "utsname_release": "4.18.0-305.19.1.el8_4.x86_64",
>     "utsname_sysname": "Linux",
>     "utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"
> }
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
> ---------------------------------------------------
>
> From: 胡 玮文 <huww98@xxxxxxxxxxx>
> Sent: Monday, October 4, 2021 12:13 AM
> To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>; Igor Fedotov 
> <ifedotov@xxxxxxx>
> Cc: ceph-users@xxxxxxx
> Subject: 回复:  Re: is it possible to remove the db+wal from 
> an external device (nvme)
>
> Email received from the internet. If in doubt, don't click any link 
> nor open any attachment !
> ________________________________
> The stack trace (tcmalloc::allocate_full_cpp_throw_oom) seems 
> indicating you don’t have enough memory.
>
> 发件人: Szabo, Istvan (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>
> 发送时间: 2021年10月4日 0:46
> 收件人: Igor Fedotov<mailto:ifedotov@xxxxxxx>
> 抄送: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> 主题:  Re: is it possible to remove the db+wal from an 
> external device (nvme)
>
> Seems like it cannot start anymore once migrated ☹
>
> https://justpaste.it/5hkot
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx<mailto:istvan.sza
> bo@xxxxxxxxx%3cmailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
> From: Igor Fedotov <ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx>>
> Sent: Saturday, October 2, 2021 5:22 AM
> To: Szabo, Istvan (Agoda)
> <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>>
> Cc: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>; Eugen Block 
> <eblock@xxxxxx<mailto:eblock@xxxxxx>>; Christian Wuerdig 
> <christian.wuerdig@xxxxxxxxx<mailto:christian.wuerdig@xxxxxxxxx>>
> Subject: Re:  Re: is it possible to remove the db+wal from 
> an external device (nvme)
>
> Email received from the internet. If in doubt, don't click any link 
> nor open any attachment !
> ________________________________
>
> Hi Istvan,
>
> yeah both db and wal to slow migration are supported. And spillover 
> state isn't a show stopper for that.
>
>
> On 10/2/2021 1:16 AM, Szabo, Istvan (Agoda) wrote:
> Dear Igor,
>
> Is the ceph-volume lvm migrate command smart enough in octopus
> 15.2.14 to be able to remove the db (included the wall) from the nvme 
> even if it is spilledover? I can’t compact back to normal many disk to 
> not show spillover warning.
>
> I think Christian has the truth of the issue, my Nvme with 30k rand 
> write iops backing 3x ssd with 67k rand write iops each …
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx<mailto:istvan.sza
> bo@xxxxxxxxx%3cmailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
>
> On 2021. Oct 1., at 11:47, Szabo, Istvan (Agoda) 
> <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>><mailto:Istvan.
> Szabo@xxxxxxxxx>
> wrote:
> 3x SSD osd /nvme
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx<mailto:istvan.sza
> bo@xxxxxxxxx%3cmailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
> -----Original Message-----
> From: Igor Fedotov
> <ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx>><mailto:ifedotov@xxxxxxx>
> Sent: Friday, October 1, 2021 4:35 PM
> To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> Subject:  Re: is it possible to remove the db+wal from an 
> external device (nvme)
>
> Email received from the internet. If in doubt, don't click any link 
> nor open any attachment !
> ________________________________
>
> And how many OSDs are per single NVMe do you have?
>
> On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:
>
> I have my dashboards and I can see that the db nvmes are always 
> running on 100% utilization (you can monitor with iostat -x 1)  and it 
> generates all the time iowaits which is between 1-3.
>
> I’m using nvme in front of the ssds.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx><mailto:istvan.sz
> abo@xxxxxxxxx><mailto:istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@agoda
> .com%3cmailto:istvan.szabo@xxxxxxxxx%3e%3cmailto:istvan.szabo@xxxxxxxx
> m%3e%3cmailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
> From: Victor Hooi
> <victorhooi@xxxxxxxxx<mailto:victorhooi@xxxxxxxxx>><mailto:victorhooi@
> yahoo.com>
> Sent: Friday, October 1, 2021 5:30 AM
> To: Eugen Block 
> <eblock@xxxxxx<mailto:eblock@xxxxxx>><mailto:eblock@xxxxxx>
> Cc: Szabo, Istvan (Agoda)
> <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>><mailto:Istvan.
> Szabo@xxxxxxxxx>; 胡
> 玮文
> <huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx>><mailto:huww98@outlook.
> com>;
> ceph-users
> <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>><mailto:ceph-users@ceph
> .io>
> Subject: Re:  Re: is it possible to remove the db+wal from 
> an external device (nvme)
>
> Email received from the internet. If in doubt, don't click any link 
> nor open any attachment !
> ________________________________
> Hi,
>
> I'm curious - how did you tell that the separate WAL+DB volume was 
> slowing things down? I assume you did some benchmarking - is there any 
> chance you'd be willing to share results? (Or anybody else that's been 
> in a similar situation).
>
> What sorts of devices are you using for the WAL+DB, versus the data disks?
>
> We're using NAND SSDs, with Optanes for the WAL+DB, and on some 
> systems I am seeing slowly than expected behaviour - need to dive 
> deeper into it
>
> In my case, I was running with 4 or 2 OSDs per Optane volume:
>
> https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partition
> s_can_you_run_per_optane/
>
> but I couldn't seem to get the results I'd expected - so curious what 
> people are seeing in the real world - and of course, we might need to 
> follow the steps here to remove them as well.
>
> Thanks,
> Victor
>
> On Thu, 30 Sept 2021 at 16:10, Eugen Block 
> <eblock@xxxxxx<mailto:eblock@xxxxxx<mailto:eblock@xxxxxx%3cmailto:eblo
> ck@xxxxxx>><mailto:eblock@xxxxxx><mailto:eblock@xxxxxx>>
> wrote:
> Yes, I believe for you it should work without containers although I 
> haven't tried the migrate command in a non-containerized cluster yet.
> But I believe this is a general issue for containerized clusters with 
> regards to maintenance. I haven't checked yet if there are existing 
> tracker issues for this, but maybe this should be worth creating one?
>
>
> Zitat von "Szabo, Istvan (Agoda)"
> <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx%3cmailto:Istvan.Szabo@xxxxxxxxx>><mailto:Istvan.Szabo@xxxxxxxxx><mailto:Istvan.Szabo@xxxxxxxxx>>:
>
> Actually I don't have containerized deployment, my is normal one. So 
> it should work the lvm migrate.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx><mailto:istvan.sz
> abo@xxxxxxxxx><mailto:istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@agoda
> .com%3cmailto:istvan.szabo@xxxxxxxxx%3e%3cmailto:istvan.szabo@xxxxxxxx
> m%3e%3cmailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
> -----Original Message-----
> From: Eugen Block
> <eblock@xxxxxx<mailto:eblock@xxxxxx<mailto:eblock@xxxxxx%3cmailto:eblo
> ck@xxxxxx>><mailto:eblock@xxxxxx><mailto:eblock@xxxxxx>>
> Sent: Wednesday, September 29, 2021 8:49 PM
> To: 胡 玮文
> <huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxx
> m%3cmailto:huww98@xxxxxxxxxxx>><mailto:huww98@xxxxxxxxxxx><mailto:huww
> 98@xxxxxxxxxxx>>
> Cc: Igor Fedotov
> <ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx%3cma
> ilto:ifedotov@xxxxxxx>><mailto:ifedotov@xxxxxxx><mailto:ifedotov@suse.
> de>>;
> Szabo,
> Istvan (Agoda)
> <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Sz
> abo@xxxxxxxxx%3cmailto:Istvan.Szabo@xxxxxxxxx>><mailto:Istvan.Szabo@ag
> oda.com><mailto:Istvan.Szabo@xxxxxxxxx>>;
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>>
> Subject: Re: is it possible to remove the db+wal from an external 
> device (nvme)
>
> Email received from the internet. If in doubt, don't click any link 
> nor open any attachment !
> ________________________________
>
> That's what I did and pasted the results in my previous comments.
>
>
> Zitat von 胡 玮文
> <huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx%3cmailto:huww98@xxxxxxxxxxx>><mailto:huww98@xxxxxxxxxxx><mailto:huww98@xxxxxxxxxxx>>:
>
> Yes. And “cephadm shell” command does not depend on the running 
> daemon, it will start a new container. So I think it is perfectly fine 
> to stop the OSD first then run the “cephadm shell” command, and run 
> ceph-volume in the new shell.
>
> 发件人: Eugen
> Block<mailto:eblock@xxxxxx<mailto:eblock@xxxxxx><mailto:eblock@xxxxxx<
> mailto:eblock@xxxxxx%3cmailto:eblock@xxxxxx%3e%3cmailto:eblock@xxxxxx>
> >>
> 发送时间: 2021年9月29日 21:40
> 收件人: 胡
> 玮文<mailto:huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx><mailto:huww98@
> outlook.com<mailto:huww98@xxxxxxxxxxx%3cmailto:huww98@xxxxxxxxxxx%3e%3
> cmailto:huww98@xxxxxxxxxxx>>>
> 抄送: Igor
> Fedotov<mailto:ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx><mailto:ifedot
> ov@xxxxxxx<mailto:ifedotov@xxxxxxx%3cmailto:ifedotov@xxxxxxx%3e%3cmail
> to:ifedotov@xxxxxxx>>>;
> Szabo, Istvan
> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx<ma
> ilto:Istvan.Szabo@xxxxxxxxx%3cmailto:Istvan.Szabo@xxxxxxxxx>><mailto:I
> stvan.Szabo@xxxxxxxxx>
> ;
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>><mailto:ceph-users@cep
> h 
> .io<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx<mailto:ceph-u
> sers@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>>>
> 主题: Re: is it possible to remove the db+wal from an external device
> (nvme)
>
> The OSD has to be stopped in order to migrate DB/WAL, it can't be done 
> live. ceph-volume requires a lock on the device.
>
>
> Zitat von 胡 玮文
> <huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx%3cmailto:huww98@xxxxxxxxxxx>><mailto:huww98@xxxxxxxxxxx><mailto:huww98@xxxxxxxxxxx>>:
>
> I’ve not tried it, but how about:
>
> cephadm shell -n osd.0
>
> then run “ceph-volume” commands in the newly opened shell. The 
> directory structure seems fine.
>
> $ sudo cephadm shell -n osd.0
> Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
> Inferring config
> /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
> Using recent ceph image
> cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d<ma
> ilto:cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa4353
> 4d<mailto:cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9f
> a43534d%3cmailto:cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c
> 91d3f9fa43534d>>
> 37<http://cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3
> f9fa43534d37<http://cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488
> e2c91d3%0bf9fa43534d37>>
> d7a9b37db1e0ff6691aae6466530 root@host0:/# ll 
> /var/lib/ceph/osd/ceph-0/ total 68
> drwx------ 2 ceph ceph 4096 Sep 20 04:15 ./
> drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
> lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
> lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db ->
> /dev/ubuntu-vg/osd.0.db
> -rw------- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
> -rw------- 1 ceph ceph  387 Jun 21 13:24 config
> -rw------- 1 ceph ceph   37 Sep 20 04:15 fsid
> -rw------- 1 ceph ceph   55 Sep 20 04:15 keyring
> -rw------- 1 ceph ceph    6 Sep 20 04:15 ready
> -rw------- 1 ceph ceph    3 Apr  2 01:46 require_osd_release
> -rw------- 1 ceph ceph   10 Sep 20 04:15 type
> -rw------- 1 ceph ceph   38 Sep 17 14:26 unit.configured
> -rw------- 1 ceph ceph   48 Nov  9  2020 unit.created
> -rw------- 1 ceph ceph   35 Sep 17 14:26 unit.image
> -rw------- 1 ceph ceph  306 Sep 17 14:26 unit.meta
> -rw------- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
> -rw------- 1 ceph ceph 3021 Sep 17 14:26 unit.run
> -rw------- 1 ceph ceph  142 Sep 17 14:26 unit.stop
> -rw------- 1 ceph ceph    2 Sep 20 04:15 whoami
>
> 发件人: Eugen
> Block<mailto:eblock@xxxxxx<mailto:eblock@xxxxxx><mailto:eblock@xxxxxx<
> mailto:eblock@xxxxxx%3cmailto:eblock@xxxxxx%3e%3cmailto:eblock@xxxxxx>
> >>
> 发送时间: 2021年9月29日 21:29
> 收件人: Igor
> Fedotov<mailto:ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx><mailto:ifedot
> ov@xxxxxxx<mailto:ifedotov@xxxxxxx%3cmailto:ifedotov@xxxxxxx%3e%3cmail
> to:ifedotov@xxxxxxx>>>
> 抄送: 胡
> 玮文<mailto:huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx><mailto:huww98@
> outlook.com<mailto:huww98@xxxxxxxxxxx%3cmailto:huww98@xxxxxxxxxxx%3e%3
> cmailto:huww98@xxxxxxxxxxx>>>;
> Szabo, Istvan
> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx
> ;
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>><mailto:ceph-users@cep
> h.io<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx<mailto:ceph-
> users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>>>
> 主题: Re:  Re: 回复: [ceph-users] Re: is it possible to remove 
> the db+wal from an external device (nvme)
>
> Hi Igor,
>
> thanks for your input. I haven't done this in a prod env yet either, 
> still playing around in a virtual lab env.
> I tried the symlink suggestion but it's not that easy, because it 
> looks different underneath the ceph directory than ceph-volume expects 
> it. These are the services underneath:
>
> ses7-host1:~ # ll
> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
> insgesamt 48
> drwx------ 3 root       root   4096 16. Sep 16:11 alertmanager.ses7-host1
> drwx------ 3 ceph       ceph   4096 29. Sep 09:03 crash
> drwx------ 2 ceph       ceph   4096 16. Sep 16:39 crash.ses7-host1
> drwx------ 4 messagebus lp     4096 16. Sep 16:23 grafana.ses7-host1
> drw-rw---- 2 root       root   4096 24. Aug 10:00 home
> drwx------ 2 ceph       ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
> drwx------ 3 ceph       ceph   4096 16. Sep 16:37 mon.ses7-host1
> drwx------ 2 nobody     nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
> drwx------ 2 ceph       ceph   4096 29. Sep 08:43 osd.0
> drwx------ 2 ceph       ceph   4096 29. Sep 15:11 osd.1
> drwx------ 4 root       root   4096 16. Sep 16:12 prometheus.ses7-host1
>
>
> While the directory in a non-containerized deployment looks like this:
>
> nautilus:~ # ll /var/lib/ceph/osd/ceph-0/ insgesamt 24 lrwxrwxrwx 1 
> ceph ceph 93 29. Sep 12:21 block ->
> /dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-5
> 4b
> 3-4689-9896-f54d005c535d
> -rw------- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
> -rw------- 1 ceph ceph 37 29. Sep 12:21 fsid
> -rw------- 1 ceph ceph 55 29. Sep 12:21 keyring
> -rw------- 1 ceph ceph  6 29. Sep 12:21 ready
> -rw------- 1 ceph ceph 10 29. Sep 12:21 type
> -rw------- 1 ceph ceph  2 29. Sep 12:21 whoami
>
>
> But even if I create the symlink to the osd directory it fails because 
> I only have ceph-volume within the containers where the symlink is not 
> visible to cephadm.
>
>
> ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1 lrwxrwxrwx 1 root root
> 57 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 -> 
> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
>
> ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-4
> 83
> d-ae58-0ab97b8d0cc4 Inferring fsid
> 152fd738-01bc-11ec-a7fd-fa163e672db2
> [...]
> /usr/bin/podman: stderr --> Migrate to existing, Source:
> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target:
> /var/lib/ceph/osd/ceph-1/block
> /usr/bin/podman: stderr  stdout: inferring bluefs devices from 
> bluestore path
> /usr/bin/podman: stderr  stderr: can't migrate 
> /var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume
> /usr/bin/podman: stderr --> Failed to migrate device, error code:1
> /usr/bin/podman: stderr --> Undoing lv tag set
> /usr/bin/podman: stderr Failed to migrate to :
> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-4
> 83
> d-ae58-0ab97b8d0cc4
> Traceback (most recent call last):
>    File "/usr/sbin/cephadm", line 6225, in <module>
>      r = args.func()
>    File "/usr/sbin/cephadm", line 1363, in _infer_fsid
>      return func()
>    File "/usr/sbin/cephadm", line 1422, in _infer_image
>      return func()
>    File "/usr/sbin/cephadm", line 3687, in command_ceph_volume
>      out, err, code = call_throws(c.run_cmd(),
> verbosity=CallVerbosity.VERBOSE)
>    File "/usr/sbin/cephadm", line 1101, in call_throws
>      raise RuntimeError('Failed command: %s' % ' '.join(command)) 
> [...]
>
>
> I could install the package ceph-osd (where ceph-volume is packaged
> in) but it's not available by default (as you see this is a SES 7 
> environment).
>
> I'm not sure what the design is here, it feels like the ceph-volume 
> migrate command is not applicable to containers yet.
>
> Regards,
> Eugen
>
>
> Zitat von Igor Fedotov
> <ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx<mailto:ifedotov@xxxxxxx%3cmailto:ifedotov@xxxxxxx>><mailto:ifedotov@xxxxxxx><mailto:ifedotov@xxxxxxx>>:
>
> Hi Eugen,
>
> indeed this looks like an issue related to containerized deployment, 
> "ceph-volume lvm migrate" expects osd folder to be under
> /var/lib/ceph/osd:
>
> stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock 
> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
> running?)(11) Resource temporarily unavailable As a workaround you 
> might want to try to create a symlink to your actual location before 
> issuing the migrate command:
> /var/lib/ceph/osd ->
> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
>
> More complicated (and more general IMO) way would be to run the 
> migrate command from within a container deployed similarly (i.e.
> with all the proper subfolder mappings) to ceph-osd one. Just 
> speculating - not a big expert in containers and never tried that with 
> properly deployed production cluster...
>
>
> Thanks,
>
> Igor
>
> On 9/29/2021 10:07 AM, Eugen Block wrote:
> Hi,
>
> I just tried with 'ceph-volume lvm migrate' in Octopus but it doesn't 
> really work. I'm not sure if I'm missing something here, but I believe 
> it's again the already discussed containers issue.
> To be able to run the command for an OSD the OSD has to be offline, 
> but then you don't have access to the block.db because the path is 
> different from outside the container:
>
> ---snip---
> [ceph: root@host1 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid
> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8
> -4
> 83d-ae58-0ab97b8d0cc4 --> Migrate to existing, Source:
> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db']
> Target:
> /var/lib/ceph/osd/ceph-1/block
>  stdout: inferring bluefs devices from bluestore path
>  stderr:
> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/b
> lu
> estore/BlueStore.cc: In function 'int
> BlueStore::_mount_for_bluefs()' thread
> 7fde05b96180
> time
> 2021-09-29T06:56:24.790161+0000
>  stderr:
> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/b
> lu
> estore/BlueStore.cc: 6876: FAILED ceph_assert(r ==
> 0)
>  stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock 
> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
> running?)(11) Resource temporarily unavailable
>
>
> # path outside
> host1:~ # ll
> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
> insgesamt 60
> lrwxrwxrwx 1 ceph ceph   93 29. Sep 08:43 block ->
> /dev/ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
> lrwxrwxrwx 1 ceph ceph   90 29. Sep 08:43 block.db ->
> /dev/ceph-6f1b8f49-daf2-4631-a2ef-12e9452b01ea/osd-db-69b11aa0-af
> 96
> -443e-8f03-5afa5272131f
> ---snip---
>
>
> But if I shutdown the OSD I can't access the block and block.db 
> devices. I'm not even sure how this is supposed to work with cephadm. 
> Maybe I'm misunderstanding, though. Or is there a way to provide the 
> offline block.db path to 'ceph-volume lvm migrate'?
>
>
>
> Zitat von 胡 玮文
> <huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx%3cmailto:huww98@xxxxxxxxxxx>><mailto:huww98@xxxxxxxxxxx><mailto:huww98@xxxxxxxxxxx>>:
>
> You may need to use `ceph-volume lvm migrate’ [1] instead of 
> ceph-bluestore-tool. If I recall correctly, this is a pretty new 
> feature, I’m not sure whether it is available to your version.
>
> If you use ceph-bluestore-tool, then you need to modify the LVM tags 
> manually. Please refer to the previous threads, e.g. [2] and some 
> more.
>
> [1]: https://docs.ceph.com/en/latest/man/8/ceph-volume/#migrate
> [2]:
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/
> VX
> 23NQ66P3PPEX36T3PYYMHPLBSFLMYA/#JLNDFGXR4ZLY27DHD3RJTTZEDHRZJO4Q
>
> 发件人: Szabo, Istvan
> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@agoda.
> com<mailto:Istvan.Szabo@xxxxxxxxx%3cmailto:Istvan.Szabo@agoda.%0bcom>>
> >
> 发送时间: 2021年9月28日 18:20
> 收件人: Eugen
> Block<mailto:eblock@xxxxxx<mailto:eblock@xxxxxx><mailto:eblock@xxxxxx<
> mailto:eblock@xxxxxx%3cmailto:eblock@xxxxxx%3e%3cmailto:eblock@xxxxxx>
> >>; 
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>><mailto:ceph-users@
> ceph.io<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx<mailto:ce
> ph-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>>>
> 主题:  Re: is it possible to remove the db+wal from an 
> external device (nvme)
>
> Gave a try of it, so all the 3 osds finally failed :/ Not sure what 
> went wrong.
>
> Do the normal maintenance things, ceph osd set noout, ceph osd set 
> norebalance, stop the osd and run this command:
> ceph-bluestore-tool bluefs-bdev-migrate --dev-target 
> /var/lib/ceph/osd/ceph-0/block --devs-source 
> /var/lib/ceph/osd/ceph-8/block.db --path /var/lib/ceph/osd/ceph-8/
> Output:
> device removed:1 /var/lib/ceph/osd/ceph-8/block.db device added:
> 1
> /dev/dm-2
>
> When tried to start I got this in the log:
> osd.8 0 OSD:init: unable to mount object store
>  ** ERROR: osd init failed: (13) Permission denied set uid:gid to
> 167:167 (ceph:ceph) ceph version 15.2.13
> (c44bc49e7a57a87d84dfff2a077a2058aa2172e2)
> octopus (stable), process ceph-osd, pid 1512261
> pidfile_write: ignore empty --pid-file
>
> From the another 2 osds the block.db removed and I can start it back.
> I've zapped the db drive just to be removed from the device completely 
> and after machine restart none of these 2 osds came back, I guess 
> missing the db device.
>
> Is there any steps missing?
> 1.Noout+norebalance
> 2. Stop osd
> 3. migrate with the above command the block.db to the block.
> 4. do on the other osds which is sharing the same db device that want 
> to remove.
> 5. zap the db device
> 6. start back the osds.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e:
> istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx><mailto:istvan.sz
> abo@xxxxxxxxx><mailto:istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@agoda
> .com%3cmailto:istvan.szabo@xxxxxxxxx%3e%3cmailto:istvan.szabo@xxxxxxxx
> m%3e%3cmailto:istvan.szabo@xxxxxxxxx>>
> ---------------------------------------------------
>
> -----Original Message-----
> From: Eugen Block
> <eblock@xxxxxx<mailto:eblock@xxxxxx<mailto:eblock@xxxxxx%3cmailto:eblo
> ck@xxxxxx>><mailto:eblock@xxxxxx><mailto:eblock@xxxxxx>>
> Sent: Monday, September 27, 2021 7:42 PM
> To:
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>>
> Subject:  Re: is it possible to remove the db+wal from an 
> external device (nvme)
>
> Email received from the internet. If in doubt, don't click any link 
> nor open any attachment !
> ________________________________
>
> Hi,
>
> I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use 
> here. I haven't tried it in a production environment yet, only in 
> virtual labs.
>
> Regards,
> Eugen
>
>
> Zitat von "Szabo, Istvan (Agoda)"
> <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx%3cmailto:Istvan.Szabo@xxxxxxxxx>><mailto:Istvan.Szabo@xxxxxxxxx><mailto:Istvan.Szabo@xxxxxxxxx>>:
>
> Hi,
>
> Seems like in our config the nvme device  as a wal+db in front of the 
> ssd slowing down the ssds osds.
> I'd like to avoid to rebuild all the osd-, is there a way somehow 
> migrate to the "slower device" the wal+db without reinstall?
>
> Ty
> _______________________________________________
> ceph-users mailing list --
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>> To unsubscribe send 
> an email to 
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-
> users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto:ceph-users
> -leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx%3e%3cmailto:ceph-user
> s-leave@xxxxxxx%3e%3cmailto:ceph-users-leave@xxxxxxx>>
>
>
> _______________________________________________
> ceph-users mailing list --
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>> To unsubscribe send 
> an email to 
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-
> users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto:ceph-users
> -leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx%3e%3cmailto:ceph-user
> s-leave@xxxxxxx%3e%3cmailto:ceph-users-leave@xxxxxxx>>
> _______________________________________________
> ceph-users mailing list --
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>> To unsubscribe send 
> an email to 
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-
> users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto:ceph-users
> -leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx%3e%3cmailto:ceph-user
> s-leave@xxxxxxx%3e%3cmailto:ceph-users-leave@xxxxxxx>>
>
>
> _______________________________________________
> ceph-users mailing list --
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>> To unsubscribe send 
> an email to 
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-
> users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto:ceph-users
> -leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx%3e%3cmailto:ceph-user
> s-leave@xxxxxxx%3e%3cmailto:ceph-users-leave@xxxxxxx>>
>
>
> _______________________________________________
> ceph-users mailing list --
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@ceph.i
> o><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx%3e%3cmailto:cep
> h-users@xxxxxxx%3e%3cmailto:ceph-users@xxxxxxx>>
> To unsubscribe send an email to
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-
> users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto:ceph-users
> -leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx%3e%3cmailto:ceph-user
> s-leave@xxxxxxx%3e%3cmailto:ceph-users-leave@xxxxxxx>>
> _______________________________________________
> ceph-users mailing list --
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an 
> email to 
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx<mailto:ceph-u
> sers-leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx>>
> _______________________________________________
> ceph-users mailing list --
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an 
> email to 
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx<mailto:ceph-u
> sers-leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx>>
> _______________________________________________
> ceph-users mailing list -- 
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx<mailto:ceph-u
> sers-leave@xxxxxxx%3cmailto:ceph-users-leave@xxxxxxx>>
> _______________________________________________
> ceph-users mailing list -- 
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to
> ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux