... however, it would be nice if ceph-volume would also create the partitions for the WAL and/or DB if needed. Is there a special reason, why this is not implemented? Dietmar On 02/27/2018 04:25 PM, David Turner wrote: > Gotcha. As a side note, that setting is only used by ceph-disk as > ceph-volume does not create partitions for the WAL or DB. You need to > create those partitions manually if using anything other than a whole > block device when creating OSDs with ceph-volume. > > On Tue, Feb 27, 2018 at 8:20 AM Caspar Smit <casparsmit@xxxxxxxxxxx > <mailto:casparsmit@xxxxxxxxxxx>> wrote: > > David, > > Yes i know, i use 20GB partitions for 2TB disks as journal. It was > just to inform other people that Ceph's default of 1GB is pretty low. > Now that i read my own sentence it indeed looks as if i was using > 1GB partitions, sorry for the confusion. > > Caspar > > 2018-02-27 14:11 GMT+01:00 David Turner <drakonstein@xxxxxxxxx > <mailto:drakonstein@xxxxxxxxx>>: > > If you're only using a 1GB DB partition, there is a very real > possibility it's already 100% full. The safe estimate for DB > size seams to be 10GB/1TB so for a 4TB osd a 40GB DB should work > for most use cases (except loads and loads of small files). > There are a few threads that mention how to check how much of > your DB partition is in use. Once it's full, it spills over to > the HDD. > > > On Tue, Feb 27, 2018, 6:19 AM Caspar Smit > <casparsmit@xxxxxxxxxxx <mailto:casparsmit@xxxxxxxxxxx>> wrote: > > 2018-02-26 23:01 GMT+01:00 Gregory Farnum > <gfarnum@xxxxxxxxxx <mailto:gfarnum@xxxxxxxxxx>>: > > On Mon, Feb 26, 2018 at 3:23 AM Caspar Smit > <casparsmit@xxxxxxxxxxx <mailto:casparsmit@xxxxxxxxxxx>> > wrote: > > 2018-02-24 7:10 GMT+01:00 David Turner > <drakonstein@xxxxxxxxx <mailto:drakonstein@xxxxxxxxx>>: > > Caspar, it looks like your idea should work. > Worst case scenario seems like the osd wouldn't > start, you'd put the old SSD back in and go back > to the idea to weight them to 0, backfilling, > then recreate the osds. Definitely with a try in > my opinion, and I'd love to hear your experience > after. > > > Hi David, > > First of all, thank you for ALL your answers on this > ML, you're really putting a lot of effort into > answering many questions asked here and very often > they contain invaluable information. > > > To follow up on this post i went out and built a > very small (proxmox) cluster (3 OSD's per host) to > test my suggestion of cloning the DB/WAL SDD. And it > worked! > Note: this was on Luminous v12.2.2 (all bluestore, > ceph-disk based OSD's) > > Here's what i did on 1 node: > > 1) ceph osd set noout > 2) systemctl stop osd.0; systemctl stop > osd.1; systemctl stop osd.2 > 3) ddrescue -f -n -vv <old SSD dev> <new SSD dev> > /root/clone-db.log > 4) removed the old SSD physically from the node > 5) checked with "ceph -s" and already saw HEALTH_OK > and all OSD's up/in > 6) ceph osd unset noout > > I assume that once the ddrescue step is finished a > 'partprobe' or something similar is triggered and > udev finds the DB partitions on the new SSD and > starts the OSD's again (kind of what happens during > hotplug) > So it is probably better to clone the SSD in another > (non-ceph) system to not trigger any udev events. > > I also tested a reboot after this and everything > still worked. > > > The old SSD was 120GB and the new is 256GB (cloning > took around 4 minutes) > Delta of data was very low because it was a test > cluster. > > All in all the OSD's in question were 'down' for > only 5 minutes (so i stayed within the > ceph_osd_down_out interval of the default 10 minutes > and didn't actually need to set noout :) > > > I kicked off a brief discussion about this with some of > the BlueStore guys and they're aware of the problem with > migrating across SSDs, but so far it's just a Trello > card: https://trello.com/c/9cxTgG50/324-bluestore-add-remove-resize-wal-db > They do confirm you should be okay with dd'ing things > across, assuming symlinks get set up correctly as David > noted. > > > Great that it is on the radar to address. This method feels > hacky. > > > I've got some other bad news, though: BlueStore has > internal metadata about the size of the block device > it's using, so if you copy it onto a larger block > device, it will not actually make use of the additional > space. :( > -Greg > > > Yes, i was well aware of that, no problem. The reason was > the smaller SSD sizes are simply not being made anymore or > discontinued by the manufacturer. > Would be nice though if the DB size could be resized in the > future, the default 1GB DB size seems very small to me. > > Caspar > > > > > > Kind regards, > Caspar > > > > Nico, it is not possible to change the WAL or DB > size, location, etc after osd creation. If you > want to change the configuration of the osd > after creation, you have to remove it from the > cluster and recreate it. There is no similar > functionality to how you could move, recreate, > etc filesystem osd journals. I think this might > be on the radar as a feature, but I don't know > for certain. I definitely consider it to be a > regression of bluestore. > > > > > On Fri, Feb 23, 2018, 9:13 AM Nico Schottelius > <nico.schottelius@xxxxxxxxxxx > <mailto:nico.schottelius@xxxxxxxxxxx>> wrote: > > > A very interesting question and I would add > the follow up question: > > Is there an easy way to add an external > DB/WAL devices to an existing > OSD? > > I suspect that it might be something on the > lines of: > > - stop osd > - create a link in > ...ceph/osd/ceph-XX/block.db to the target > device > - (maybe run some kind of osd mkfs ?) > - start osd > > Has anyone done this so far or > recommendations on how to do it? > > Which also makes me wonder: what is actually > the format of WAL and > BlockDB in bluestore? Is there any > documentation available about it? > > Best, > > Nico > > > Caspar Smit <casparsmit@xxxxxxxxxxx > <mailto:casparsmit@xxxxxxxxxxx>> writes: > > > Hi All, > > > > What would be the proper way to > preventively replace a DB/WAL SSD (when it > > is nearing it's DWPD/TBW limit and not > failed yet). > > > > It hosts DB partitions for 5 OSD's > > > > Maybe something like: > > > > 1) ceph osd reweight 0 the 5 OSD's > > 2) let backfilling complete > > 3) destroy/remove the 5 OSD's > > 4) replace SSD > > 5) create 5 new OSD's with seperate DB > partition on new SSD > > > > When these 5 OSD's are big HDD's (8TB) a > LOT of data has to be moved so i > > thought maybe the following would work: > > > > 1) ceph osd set noout > > 2) stop the 5 OSD's (systemctl stop) > > 3) 'dd' the old SSD to a new SSD of same > or bigger size > > 4) remove the old SSD > > 5) start the 5 OSD's (systemctl start) > > 6) let backfilling/recovery complete (only > delta data between OSD stop and > > now) > > 6) ceph osd unset noout > > > > Would this be a viable method to replace a > DB SSD? Any udev/serial nr/uuid > > stuff preventing this to work? > > > > Or is there another 'less hacky' way to > replace a DB SSD without moving too > > much data? > > > > Kind regards, > > Caspar > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Modern, affordable, Swiss Virtual Machines. > Visit www.datacenterlight.ch > <http://www.datacenterlight.ch> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- _________________________________________ D i e t m a r R i e d e r, Mag.Dr. Innsbruck Medical University Biocenter - Division for Bioinformatics Innrain 80, 6020 Innsbruck Phone: +43 512 9003 71402 Fax: +43 512 9003 73100 Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com