-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 My understanding of growing file systems is the same as yours, it can only grow at the end not the beginning. In addition to that, having partition 2 before partition 1 just cries to me to have it fixed, but that is just aesthetic. Because the weights of the drives will be different, there will be some additional data movement (probably minimized if you are using straw2). Setting noout will prevent Ceph from shuffling data around while you are making the changes. When you bring the OSD back in, it should receive only the PGs that were on it before minimizing the data movement in the cluster. But because you are adding 800 GB, it will want to take a few more PGs and so some shuffling in the cluster is inevitable. I don't know how well it would work, but you could bring in all the reformatted OSDs in at the same weight as the current weight and then when you have them all re-done, edit the crush map to set the weights right, ideally the ratio would be the same so no (or very little) data movement would occur. Due to an error in the straw algorithm, there is still the potential of large amounts of data movement with small weight changes. As to your question about adding the disk before the rebalance is completed, it will be fine to do so. Ceph will complete the PGs that are currently being relocated, but compute new locations based on the new disk. This may result in a PG that just finished moving to be relocated again. The cluster will still perform and not lose data. About saving OSD IDs; I only know that if you don't have gaps in your OSDs (some were retired and not replaced) then if you remove an OSD and recreate it, it will get the same number as the lowest available number is the same as the OSD being replaced. I don't know about saving off the files before wiping the OSD if it will keep the identity. -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.0.2 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJV+aE4CRDmVDuy+mK58QAA7hEQAIluPpdYtvhpkIJiWabb jWBkjOk3W6Am9aosQm88IF3biOMVGBQN2Xs9PgDW2lMz4aU1Vh6rpACCRFt0 Xn46pLanS4lPF/nYClUhu34z5LzNOZv84YEhwbc9KOUHIUs0Ijv7AlkyOn3S bn1fbx7YUVbliqj6171jvEZKYndYdVe/nLeGVQu+DAkFyycSe+cj4fSnXtgr xkRd6EDLiXBf8YuqX1sLjwDrtVYoNiPh4R7q1XA1zOkemuMlqwCwxCCJAxuq 5mKMg3DbJfPelSeOV6GXrMJt7GGTj8qUDzBGhvfhPBu1/XtfgRQar6VTi3gG tdE0S+i8u5Ir9ze8aGvcl7ocmJXtcDa4LIyKmspz1vhPHCgG451W/vCu4mPV lhym50/+arLSePxoZiQLwazfCx2T3XxcGBOK2KJ13rMVnt4HXsnfnG1x4T9U 0yIolZhPJDY30kyNXAEkivXnShfT9iOsIEFgb3LwhMJNR3uVVgOzQOL5CGlj NDj5ZebzqsowfflwRxhQIWTo+F2zLXMt5gv5Xqq8UeLuEsx81I9wJh0+DwYM ISHOHtE/COhlaRiyEk1q3ZzZe56baW5W3KnjNuYmF13jpMfS2ctoAEAUvGxS d4frVCFJYXZ+5d8b7dYTU5mbqKe59yEPq3yjAOIZPL9PWn1jHfgjylvOMyMw hihd =GGct -----END PGP SIGNATURE----- ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Sep 16, 2015 at 10:12 AM, John-Paul Robinson <jpr@xxxxxxx> wrote: > Christian, > > Thanks for the feedback. > > I guess I'm wondering about step 4 "clobber partition, leaving data in > tact and grow partition and the file system as needed". > > My understanding of xfs_growfs is that the free space must be at the end > of the existing file system. In this case the existing partition starts > around the 800GB mark on the disk and and extends to the end of the > disk. My goal is to add the first 800GB on the disk to that partition > so it can become a single data partition. > > Note that my volumes are not LVM based so I can't extend the volume by > incorporating the free space at the start of the disk. > > Am I misunderstanding something about file system grow commands? > > Regarding your comments, on impact to the cluster of a downed OSD. I > have lost OSDs and the impact is minimal (acceptable). > > My concern is around taking an OSD down, having the cluster initiate > recovery and then bringing that same OSD back into the cluster in an > empty state. Are the placement groups that originally had data on this > OSD already remapped by this point (even if they aren't fully recovered) > so that bring the empty, replacement OSD on-line simply causes a > different set of placement groups to be mapped onto it to achieve the > rebalance? > > Thanks, > > ~jpr > > On 09/16/2015 08:37 AM, Christian Balzer wrote: >> Hello, >> >> On Wed, 16 Sep 2015 07:21:26 -0500 John-Paul Robinson wrote: >> >>> > The move journal, partition resize, grow file system approach would >>> > work nicely if the spare capacity were at the end of the disk. >>> > >> That shouldn't matter, you can "safely" loose your journal in controlled >> circumstances. >> >> This would also be an ideal time to put your journals on SSDs. ^o^ >> >> Roughly (you do have a test cluster, do you? Or at least try this with >> just one OSD): >> >> 1. set noout just to be sure. >> 2. stop the OSD >> 3. "ceph-osd -i osdnum --flush-journal" for warm fuzzies (see man page or >> --help) >> 4. clobber your partitions in a way that leaves you with an intact data >> partition, grow that and the FS in it as desired. >> 5. re-init the journal with "ceph-osd -i osdnum --mkjournal" >> 6. start the OSD and rejoice. >> >> More below. >> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com