Any chance you have two or more instance of rbd-mirror daemon running against the same cluster (zone2 in this instance)? The error message is stating that there is another process that owns the exclusive lock to the image and it is refusing to release it. The fact that the status ping-pongs back-and-forth between OK and ERROR/WARNING also hints that you have two or more rbd-mirror daemons fighting each other. In the Jewel and Kraken releases, we unfortunately only support a single rbd-mirror daemon process per cluster. In the forthcoming Luminous release, we are hoping to add active/active support (it already safely supports self-promoting active/passive if more than one rbd-mirror daemon process is started). On Thu, Mar 16, 2017 at 5:48 PM, daniel parkes <liquidsmail@xxxxxxxxx> wrote: > Hi!, > > I'm having a problem with a new ceph deployment using rbd mirroring and it's > just in case someone can help me out or point me in the right direction. > > I have a ceph jewel install, with 2 clusters(zone1,zone2), rbd is working > fine, but the rbd mirroring between sites is not working correctly. > > I have configured pool replication in the default rbd pool, I have setup > the peers and created 2 test images: > > [root@mon3 ceph]# rbd --user zone1 --cluster zone1 mirror pool info > Mode: pool > Peers: > UUID NAME CLIENT > 397b37ef-8300-4dd3-a637-2a03c3b9289c zone2 client.zone2 > [root@mon3 ceph]# rbd --user zone2 --cluster zone2 mirror pool info > Mode: pool > Peers: > UUID NAME CLIENT > 2c11f1dc-67a4-43f1-be33-b785f1f6b366 zone1 client.zone1 > > Primary is ok: > > [root@mon3 ceph]# rbd --user zone1 --cluster zone1 mirror pool status > --verbose > health: OK > images: 2 total > 2 stopped > > test-2: > global_id: 511e3aa4-0e24-42b4-9c2e-8d84fc9f48f4 > state: up+stopped > description: remote image is non-primary or local image is primary > last_update: 2017-03-16 17:38:08 > > And secondary is always in this state: > > [root@mon3 ceph]# rbd --user zone2 --cluster zone2 mirror pool status > --verbose > health: WARN > images: 2 total > 1 syncing > > test-2: > global_id: 511e3aa4-0e24-42b4-9c2e-8d84fc9f48f4 > state: up+syncing > description: bootstrapping, OPEN_LOCAL_IMAGE > last_update: 2017-03-16 17:41:02 > > Sometimes for a couple of seconds it goes into replay state and health ok, > but then back to bootstrapping, OPEN_LOCAL_IMAGE. what does this state > mean?. > > In the log files I have this error: > > 2017-03-16 17:43:02.404372 7ff6262e7700 -1 librbd::ImageWatcher: > 0x7ff654003190 error requesting lock: (30) Read-only file system > 2017-03-16 17:43:03.411327 7ff6262e7700 -1 librbd::ImageWatcher: > 0x7ff654003190 error requesting lock: (30) Read-only file system > 2017-03-16 17:43:04.420074 7ff6262e7700 -1 librbd::ImageWatcher: > 0x7ff654003190 error requesting lock: (30) Read-only file system > 2017-03-16 17:43:05.422253 7ff6262e7700 -1 librbd::ImageWatcher: > 0x7ff654003190 error requesting lock: (30) Read-only file system > 2017-03-16 17:43:06.428447 7ff6262e7700 -1 librbd::ImageWatcher: > 0x7ff654003190 error requesting lock: (30) Read-only file system > > Not sure to what file it refers that is RO, I have tried to strace it, but > couldn't find it. > > I have disable selinux just in case but the result is the same the OS is > rhel 7.2 by the way. > > If a do a demote/promote of the image, I get the same state and errors on > the other cluster. > > If someone could help it would be great. > > Thnx in advance. > > Regards > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com