Re: rbd resize (shrink) taking forever and a day

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 6 Jan 2015 11:24:49 -0700

Can't this be done in parallel? If the OSD doesn't have an object then
it is a noop and should be pretty quick. The number of outstanding
operations can be limited to 100 or a 1000 which would provide a
balance between speed and performance impact if there is data to be
trimmed. I'm not a big fan of a "--skip-trimming" option as there is
the potential to leave some orphan objects that may not be cleaned up
correctly.

On Tue, Jan 6, 2015 at 8:09 AM, Jake Young <jak3kaj@xxxxxxxxx> wrote:
>
>
> On Monday, January 5, 2015, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote:
>>
>> When you shrinking the RBD, most of the time was spent on
>> librbd/internal.cc::trim_image(), in this function, client will iterator all
>> unnecessary objects(no matter whether it exists) and delete them.
>>
>>
>>
>> So in this case,  when Edwin shrinking his RBD from 650PB to 650GB,
>> there are[ (650PB * 1024GB/PB -650GB) * 1024MB/GB ] / 4MB/Object =
>> 170,227,200 Objects need to be deleted.That will definitely take a long time
>> since rbd client need to send a delete request to OSD, OSD need to find out
>> the object context and delete(or doesn’t exist at all). The time needed to
>> trim an image is ratio to the size needed to trim.
>>
>>
>>
>> make another image of the correct size and copy your VM's file system to
>> the new image, then delete the old one will  NOT help in general, just
>> because delete the old volume will take exactly the same time as shrinking ,
>> they both need to call trim_image().
>>
>>
>>
>> The solution in my mind may be we can provide a “—skip-triming” flag to
>> skip the trimming. When the administrator absolutely sure there is no
>> written have taken place in the shrinking area(that means there is no object
>> created in these area), they can use this flag to skip the time consuming
>> trimming.
>>
>>
>>
>> How do you think?
>
>
> That sounds like a good solution. Like doing "undo grow image"
>
>
>>
>>
>> From: Jake Young [mailto:jak3kaj@xxxxxxxxx]
>> Sent: Monday, January 5, 2015 9:45 PM
>> To: Chen, Xiaoxi
>> Cc: Edwin Peer; ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  rbd resize (shrink) taking forever and a day
>>
>>
>>
>>
>>
>> On Sunday, January 4, 2015, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote:
>>
>> You could use rbd info <volume_name>  to see the block_name_prefix, the
>> object name consist like <block_name_prefix>.<sequence_number>,  so for
>> example, rb.0.ff53.3d1b58ba.00000000e6ad should be the <e6ad>th object  of
>> the volume with block_name_prefix rb.0.ff53.3d1b58ba.
>>
>>      $ rbd info huge
>>         rbd image 'huge':
>>          size 1024 TB in 268435456 objects
>>          order 22 (4096 kB objects)
>>          block_name_prefix: rb.0.8a14.2ae8944a
>>          format: 1
>>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Edwin Peer
>> Sent: Monday, January 5, 2015 3:55 AM
>> To: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  rbd resize (shrink) taking forever and a day
>>
>> Also, which rbd objects are of interest?
>>
>> <snip>
>> ganymede ~ # rados -p client-disk-img0 ls | wc -l
>> 1672636
>> </snip>
>>
>> And, all of them have cryptic names like:
>>
>> rb.0.ff53.3d1b58ba.00000000e6ad
>> rb.0.6d386.1d545c4d.000000011461
>> rb.0.50703.3804823e.000000001c28
>> rb.0.1073e.3d1b58ba.00000000b715
>> rb.0.1d76.2ae8944a.00000000022d
>>
>> which seem to bear no resemblance to the actual image names that the rbd
>> command line tools understands?
>>
>> Regards,
>> Edwin Peer
>>
>> On 01/04/2015 08:48 PM, Jake Young wrote:
>> >
>> >
>> > On Sunday, January 4, 2015, Dyweni - Ceph-Users
>> > <6EXbab4FYk8H@xxxxxxxxxx <mailto:6EXbab4FYk8H@xxxxxxxxxx>> wrote:
>> >
>> >     Hi,
>> >
>> >     If its the only think in your pool, you could try deleting the
>> >     pool instead.
>> >
>> >     I found that to be faster in my testing; I had created 500TB when
>> >     I meant to create 500GB.
>> >
>> >     Note for the Devs: I would be nice if rbd create/resize would
>> >     accept sizes with units (i.e. MB GB TB PB, etc).
>> >
>> >
>> >
>> >
>> >     On 2015-01-04 08:45, Edwin Peer wrote:
>> >
>> >         Hi there,
>> >
>> >         I did something stupid while growing an rbd image. I
>> > accidentally
>> >         mistook the units of the resize command for bytes instead of
>> >         megabytes
>> >         and grew an rbd image to 650PB instead of 650GB. This all
>> > happened
>> >         instantaneously enough, but trying to rectify the mistake is
>> >         not going
>> >         nearly as well.
>> >
>> >         <snip>
>> >         ganymede ~ # rbd resize --size 665600 --allow-shrink
>> >         client-disk-img0/vol-x318644f-0
>> >         Resizing image: 1% complete...
>> >         </snip>
>> >
>> >         It took a couple days before it started showing 1% complete
>> >         and has
>> >         been stuck on 1% for a couple more. At this rate, I should be
>> >         able to
>> >         shrink the image back to the intended size in about 2016.
>> >
>> >         Any ideas?
>> >
>> >         Regards,
>> >         Edwin Peer
>> >         _______________________________________________
>> >         ceph-users mailing list
>> >         ceph-users@xxxxxxxxxxxxxx
>> >         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >     _______________________________________________
>> >     ceph-users mailing list
>> >     ceph-users@xxxxxxxxxxxxxx
>> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> > You can just delete the rbd header. See Sebastien's excellent blog:
>> >
>> > http://www.sebastien-han.fr/blog/2013/12/12/rbd-image-bigger-than-your
>> > -ceph-cluster/
>> >
>> > Jake
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> Sorry, I misunderstood.
>>
>>
>>
>> The simplest approach to me is to make another image of the correct size
>> and copy your VM's file system to the new image, then delete the old one.
>>
>>
>>
>> The safest thing to do would be to mount the new file system from the VM
>> and do all the formatting / copying from there (the same way you'd move a
>> physical server's root disk to a new physical disk)
>>
>>
>>
>> I would not attempt to hack the rbd header. You open yourself up to some
>> unforeseen problems.
>>
>>
>>
>> Unless one of the ceph developers can comment there is a safe way to
>> shrink an image, assuming we know that the file system has not grown since
>> growing the disk.
>>
>>
>>
>> Jake
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com