Re: rbd resize (shrink) taking forever and a day

Josh Durgin <josh.durgin@xxxxxxxxxxx> · Tue, 06 Jan 2015 15:19:52 -0800

On 01/06/2015 10:24 AM, Robert LeBlanc wrote:
Can't this be done in parallel? If the OSD doesn't have an object then
it is a noop and should be pretty quick. The number of outstanding
operations can be limited to 100 or a 1000 which would provide a
balance between speed and performance impact if there is data to be
trimmed. I'm not a big fan of a "--skip-trimming" option as there is
the potential to leave some orphan objects that may not be cleaned up
correctly.

Yeah, a --skip-trimming option seems a bit dangerous. This trimming
actually is parallelized (10 ops at once by default, changeable via
--rbd-concurrent-management-ops) since dumpling.

What will really help without being dangerous is keeping a map of
object existence [1]. This will avoid any unnecessary trimming
automatically, and it should be possible to add to existing images.
It should be in hammer.

Josh

[1] https://github.com/ceph/ceph/pull/2700

On Tue, Jan 6, 2015 at 8:09 AM, Jake Young <jak3kaj@xxxxxxxxx> wrote:

On Monday, January 5, 2015, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote:

When you shrinking the RBD, most of the time was spent on
librbd/internal.cc::trim_image(), in this function, client will iterator all
unnecessary objects(no matter whether it exists) and delete them.

So in this case,  when Edwin shrinking his RBD from 650PB to 650GB,
there are[ (650PB * 1024GB/PB -650GB) * 1024MB/GB ] / 4MB/Object =
170,227,200 Objects need to be deleted.That will definitely take a long time
since rbd client need to send a delete request to OSD, OSD need to find out
the object context and delete(or doesn’t exist at all). The time needed to
trim an image is ratio to the size needed to trim.

make another image of the correct size and copy your VM's file system to
the new image, then delete the old one will  NOT help in general, just
because delete the old volume will take exactly the same time as shrinking ,
they both need to call trim_image().

The solution in my mind may be we can provide a “—skip-triming” flag to
skip the trimming. When the administrator absolutely sure there is no
written have taken place in the shrinking area(that means there is no object
created in these area), they can use this flag to skip the time consuming
trimming.

How do you think?

That sounds like a good solution. Like doing "undo grow image"

From: Jake Young [mailto:jak3kaj@xxxxxxxxx]
Sent: Monday, January 5, 2015 9:45 PM
To: Chen, Xiaoxi
Cc: Edwin Peer; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  rbd resize (shrink) taking forever and a day

On Sunday, January 4, 2015, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote:

You could use rbd info <volume_name>  to see the block_name_prefix, the
object name consist like <block_name_prefix>.<sequence_number>,  so for
example, rb.0.ff53.3d1b58ba.00000000e6ad should be the <e6ad>th object  of
the volume with block_name_prefix rb.0.ff53.3d1b58ba.

      $ rbd info huge
         rbd image 'huge':
          size 1024 TB in 268435456 objects
          order 22 (4096 kB objects)
          block_name_prefix: rb.0.8a14.2ae8944a
          format: 1

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Edwin Peer
Sent: Monday, January 5, 2015 3:55 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  rbd resize (shrink) taking forever and a day

Also, which rbd objects are of interest?

<snip>
ganymede ~ # rados -p client-disk-img0 ls | wc -l
1672636
</snip>

And, all of them have cryptic names like:

rb.0.ff53.3d1b58ba.00000000e6ad
rb.0.6d386.1d545c4d.000000011461
rb.0.50703.3804823e.000000001c28
rb.0.1073e.3d1b58ba.00000000b715
rb.0.1d76.2ae8944a.00000000022d

which seem to bear no resemblance to the actual image names that the rbd
command line tools understands?

Regards,
Edwin Peer

On 01/04/2015 08:48 PM, Jake Young wrote:

On Sunday, January 4, 2015, Dyweni - Ceph-Users
<6EXbab4FYk8H@xxxxxxxxxx <mailto:6EXbab4FYk8H@xxxxxxxxxx>> wrote:

     Hi,

     If its the only think in your pool, you could try deleting the
     pool instead.

     I found that to be faster in my testing; I had created 500TB when
     I meant to create 500GB.

     Note for the Devs: I would be nice if rbd create/resize would
     accept sizes with units (i.e. MB GB TB PB, etc).

     On 2015-01-04 08:45, Edwin Peer wrote:

         Hi there,

         I did something stupid while growing an rbd image. I
accidentally
         mistook the units of the resize command for bytes instead of
         megabytes
         and grew an rbd image to 650PB instead of 650GB. This all
happened
         instantaneously enough, but trying to rectify the mistake is
         not going
         nearly as well.

         <snip>
         ganymede ~ # rbd resize --size 665600 --allow-shrink
         client-disk-img0/vol-x318644f-0
         Resizing image: 1% complete...
         </snip>

         It took a couple days before it started showing 1% complete
         and has
         been stuck on 1% for a couple more. At this rate, I should be
         able to
         shrink the image back to the intended size in about 2016.

         Any ideas?

         Regards,
         Edwin Peer
         _______________________________________________
         ceph-users mailing list
         ceph-users@xxxxxxxxxxxxxx
         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

     _______________________________________________
     ceph-users mailing list
     ceph-users@xxxxxxxxxxxxxx
     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

You can just delete the rbd header. See Sebastien's excellent blog:

http://www.sebastien-han.fr/blog/2013/12/12/rbd-image-bigger-than-your
-ceph-cluster/

Jake

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Sorry, I misunderstood.

The simplest approach to me is to make another image of the correct size
and copy your VM's file system to the new image, then delete the old one.

The safest thing to do would be to mount the new file system from the VM
and do all the formatting / copying from there (the same way you'd move a
physical server's root disk to a new physical disk)

I would not attempt to hack the rbd header. You open yourself up to some
unforeseen problems.

Unless one of the ceph developers can comment there is a safe way to
shrink an image, assuming we know that the file system has not grown since
growing the disk.

Jake

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com