Can't this be done in parallel? If the OSD doesn't have an object then it is a noop and should be pretty quick. The number of outstanding operations can be limited to 100 or a 1000 which would provide a balance between speed and performance impact if there is data to be trimmed. I'm not a big fan of a "--skip-trimming" option as there is the potential to leave some orphan objects that may not be cleaned up correctly. On Tue, Jan 6, 2015 at 8:09 AM, Jake Young <jak3kaj@xxxxxxxxx> wrote: > > > On Monday, January 5, 2015, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote: >> >> When you shrinking the RBD, most of the time was spent on >> librbd/internal.cc::trim_image(), in this function, client will iterator all >> unnecessary objects(no matter whether it exists) and delete them. >> >> >> >> So in this case, when Edwin shrinking his RBD from 650PB to 650GB, >> there are[ (650PB * 1024GB/PB -650GB) * 1024MB/GB ] / 4MB/Object = >> 170,227,200 Objects need to be deleted.That will definitely take a long time >> since rbd client need to send a delete request to OSD, OSD need to find out >> the object context and delete(or doesn’t exist at all). The time needed to >> trim an image is ratio to the size needed to trim. >> >> >> >> make another image of the correct size and copy your VM's file system to >> the new image, then delete the old one will NOT help in general, just >> because delete the old volume will take exactly the same time as shrinking , >> they both need to call trim_image(). >> >> >> >> The solution in my mind may be we can provide a “—skip-triming” flag to >> skip the trimming. When the administrator absolutely sure there is no >> written have taken place in the shrinking area(that means there is no object >> created in these area), they can use this flag to skip the time consuming >> trimming. >> >> >> >> How do you think? > > > That sounds like a good solution. Like doing "undo grow image" > > >> >> >> From: Jake Young [mailto:jak3kaj@xxxxxxxxx] >> Sent: Monday, January 5, 2015 9:45 PM >> To: Chen, Xiaoxi >> Cc: Edwin Peer; ceph-users@xxxxxxxxxxxxxx >> Subject: Re: rbd resize (shrink) taking forever and a day >> >> >> >> >> >> On Sunday, January 4, 2015, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote: >> >> You could use rbd info <volume_name> to see the block_name_prefix, the >> object name consist like <block_name_prefix>.<sequence_number>, so for >> example, rb.0.ff53.3d1b58ba.00000000e6ad should be the <e6ad>th object of >> the volume with block_name_prefix rb.0.ff53.3d1b58ba. >> >> $ rbd info huge >> rbd image 'huge': >> size 1024 TB in 268435456 objects >> order 22 (4096 kB objects) >> block_name_prefix: rb.0.8a14.2ae8944a >> format: 1 >> >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Edwin Peer >> Sent: Monday, January 5, 2015 3:55 AM >> To: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: rbd resize (shrink) taking forever and a day >> >> Also, which rbd objects are of interest? >> >> <snip> >> ganymede ~ # rados -p client-disk-img0 ls | wc -l >> 1672636 >> </snip> >> >> And, all of them have cryptic names like: >> >> rb.0.ff53.3d1b58ba.00000000e6ad >> rb.0.6d386.1d545c4d.000000011461 >> rb.0.50703.3804823e.000000001c28 >> rb.0.1073e.3d1b58ba.00000000b715 >> rb.0.1d76.2ae8944a.00000000022d >> >> which seem to bear no resemblance to the actual image names that the rbd >> command line tools understands? >> >> Regards, >> Edwin Peer >> >> On 01/04/2015 08:48 PM, Jake Young wrote: >> > >> > >> > On Sunday, January 4, 2015, Dyweni - Ceph-Users >> > <6EXbab4FYk8H@xxxxxxxxxx <mailto:6EXbab4FYk8H@xxxxxxxxxx>> wrote: >> > >> > Hi, >> > >> > If its the only think in your pool, you could try deleting the >> > pool instead. >> > >> > I found that to be faster in my testing; I had created 500TB when >> > I meant to create 500GB. >> > >> > Note for the Devs: I would be nice if rbd create/resize would >> > accept sizes with units (i.e. MB GB TB PB, etc). >> > >> > >> > >> > >> > On 2015-01-04 08:45, Edwin Peer wrote: >> > >> > Hi there, >> > >> > I did something stupid while growing an rbd image. I >> > accidentally >> > mistook the units of the resize command for bytes instead of >> > megabytes >> > and grew an rbd image to 650PB instead of 650GB. This all >> > happened >> > instantaneously enough, but trying to rectify the mistake is >> > not going >> > nearly as well. >> > >> > <snip> >> > ganymede ~ # rbd resize --size 665600 --allow-shrink >> > client-disk-img0/vol-x318644f-0 >> > Resizing image: 1% complete... >> > </snip> >> > >> > It took a couple days before it started showing 1% complete >> > and has >> > been stuck on 1% for a couple more. At this rate, I should be >> > able to >> > shrink the image back to the intended size in about 2016. >> > >> > Any ideas? >> > >> > Regards, >> > Edwin Peer >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> > You can just delete the rbd header. See Sebastien's excellent blog: >> > >> > http://www.sebastien-han.fr/blog/2013/12/12/rbd-image-bigger-than-your >> > -ceph-cluster/ >> > >> > Jake >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> Sorry, I misunderstood. >> >> >> >> The simplest approach to me is to make another image of the correct size >> and copy your VM's file system to the new image, then delete the old one. >> >> >> >> The safest thing to do would be to mount the new file system from the VM >> and do all the formatting / copying from there (the same way you'd move a >> physical server's root disk to a new physical disk) >> >> >> >> I would not attempt to hack the rbd header. You open yourself up to some >> unforeseen problems. >> >> >> >> Unless one of the ceph developers can comment there is a safe way to >> shrink an image, assuming we know that the file system has not grown since >> growing the disk. >> >> >> >> Jake > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com