Re: Clone data inconsistency in hammer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

I'll test kraken tools since it happened on production, everything works there
since the clone is flattened after being created and the production equivalent
of "test" user can access the image only after it has been flattened.

The issue happened when someone accidentally removed not-yet-flattened image
using the user with weaker permissions. Good to hear this has been spotted
already.

Thanks for help,
Bartek



On Wed, 21 Dec 2016 11:53:57 -0500
Jason Dillaman <jdillama@xxxxxxxxxx> wrote:

> You are unfortunately the second person today to hit an issue where
> "rbd remove" incorrectly proceeds when it hits a corner-case error.
> 
> First things first, when you configure your new user, you needed to
> give it "rx" permissions to the parent image's pool. If you attempted
> the clone operation using the "test" user, the clone would have
> immediately failed due to this issue.
> 
> Second, unless this is a test cluster where you can delete the
> "rbd_children" object in the "rbd" pool (i.e. you don't have any
> additional clones in the rbd pool) via the rados CLI, you will need to
> use the Kraken release candidate (or master branch) version of the
> rados CLI to manually manipulate the "rbd_children" object to remove
> the dangling reference to the deleted image.
> 
> On Wed, Dec 21, 2016 at 6:57 AM, Bartłomiej Święcki
> <bartlomiej.swiecki@xxxxxxxxxxxx> wrote:
> > Hi,
> >
> > I'm currently investigating a case where Ceph cluster ended up with inconsistent clone information.
> >
> > Here's a what I did to quickly reproduce:
> > * Created new cluster (tested in hammer 0.94.6 and jewel 10.2.3)
> > * Created two pools: test and rbd
> > * Created base image in pool test, created snapshot, protected it and created clone of this snapshot in pool rbd:
> >         # rbd -p test create --size 10 --image-format 2 base
> >         # rbd -p test snap create base@base
> >         # rbd -p test snap protect base@base
> >         # rbd clone test/base@base rbd/destination
> > * Created new user called "test" with rwx permissions to rbd pool only:
> >         caps: [mon] allow r
> >         caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=rbd
> > * Using this newly creted user I removed the cloned image in rbd pool, had errors but finally removed the image:
> >         # rbd --id test -p rbd rm destination
> >         2016-12-21 11:50:03.758221 7f32b7459700 -1 librbd::image::OpenRequest: failed to retreive name: (1) Operation not permitted
> >         2016-12-21 11:50:03.758288 7f32b6c58700 -1 librbd::image::RefreshParentRequest: failed to open parent image: (1) Operation not permitted
> >         2016-12-21 11:50:03.758312 7f32b6c58700 -1 librbd::image::RefreshRequest: failed to refresh parent image: (1) Operation not permitted
> >         2016-12-21 11:50:03.758333 7f32b6c58700 -1 librbd::image::OpenRequest: failed to refresh image: (1) Operation not permitted
> >         2016-12-21 11:50:03.759366 7f32b6c58700 -1 librbd::ImageState: failed to open image: (1) Operation not permitted
> >         Removing image: 100% complete...done.
> >
> > At this point there's no cloned image but the original snapshot still has reference to it:
> >
> > # rbd -p test snap unprotect base@base
> > 2016-12-21 11:53:47.359060 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [29b0238e1f29] in pool 'rbd'
> > 2016-12-21 11:53:47.359678 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: encountered error: (16) Device or resource busy
> > 2016-12-21 11:53:47.359691 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: 0x7fee39ae9340 should_complete_error: ret_val=-16
> > 2016-12-21 11:53:47.360627 7fee037fe700 -1 librbd::SnapshotUnprotectRequest: 0x7fee39ae9340 should_complete_error: ret_val=-16
> > rbd: unprotecting snap failed: (16) Device or resource busy
> >
> > # rbd -p test children base@base
> > rbd: listing children failed: (2) No such file or directory2016-12-21
> > 11:53:08.716987 7ff2b2eaad80 -1 librbd: Error looking up name for image
> > id 29b0238e1f29 in pool rbd
> >
> >
> > Any ideas on how this could be fixed?
> >
> >
> > Thanks,
> > Bartek
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux