Re: Ceph `realm pull` permission denied error

Alex Hussein-Kershaw <Alex.Hussein-Kershaw@xxxxxxxxxxxxxx> · Mon, 13 Jul 2020 13:33:52 +0000

I got to the bottom of this – was caused by NTP server issues and a 1 hour time discrepancy between the two clusters.

This was pretty painful to get to the bottom of (didn’t find any useful logs, most description Ceph gave me was the “permission denied” error) – hopefully some future engineer can save some time from my troubles!

Thanks,
Alex

From: Alex Hussein-Kershaw
Sent: 13 July 2020 12:22
To: Zhenshi Zhou <deaderzzs@xxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject: RE:  Ceph `realm pull` permission denied error

Hi Zhenshi,

Thanks for the suggestion, unfortunately I have tried this already and had no luck ☹

Best wishes,
Alex

From: Zhenshi Zhou <deaderzzs@xxxxxxxxx<mailto:deaderzzs@xxxxxxxxx>>
Sent: 13 July 2020 10:58
To: Alex Hussein-Kershaw <Alex.Hussein-Kershaw@xxxxxxxxxxxxxx<mailto:Alex.Hussein-Kershaw@xxxxxxxxxxxxxx>>
Cc: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
Subject: Re:  Ceph `realm pull` permission denied error

NOTE: Message is from an external sender
Hi Alex,

I didn't deploy this in containers/vms, as well as ansible or other tools.
However I deployed multisite once and I remember that I restarted the
rgw on the master site before I sync realm on the secondary site.

I'm not sure if this can help.

Alex Hussein-Kershaw <Alex.Hussein-Kershaw@xxxxxxxxxxxxxx<mailto:Alex.Hussein-Kershaw@xxxxxxxxxxxxxx>> 于2020年7月13日周一 下午5:48写道：
Hi Ceph Users,

I'm struggling with an issue that I'm hoping someone can point me towards a solution.

We are using Nautilus (14.2.9) deploying Ceph in containers, in VMs. The setup that I'm working with has 3 VMs, but of-course our design expects this to be scaled by a user as appropriate. I have a cluster deployed and it's functioning happily as storage for our product, the error occurs when I go to setup a second cluster and pair it with the first. I'm using ceph-ansible to deploy.  I get the following error about 20 minutes into running the site-container playbook.

2020-07-09 14:21:10,966 p=2134 u=qs-admin |  TASK [ceph-rgw : fetch the realm] ***********************************************************************************************
************************************************************************************
2020-07-09 14:21:10,966 p=2134 u=qs-admin |  Thursday 09 July 2020  14:21:10 +0000 (0:00:00.410)       0:16:18.245 *********
2020-07-09 14:21:11,901 p=2134 u=qs-admin |  fatal: [10.225.21.213 -> 10.225.21.213]: FAILED! => changed=true
  cmd:
  - docker
  - exec
  - ceph-mon-albamons_sc2
  - radosgw-admin
  - realm
  - pull
  - --url=https://10.225.36.197:7480<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2F10.225.36.197%3A7480%2F&data=02%7C01%7CAlex.Hussein-Kershaw%40metaswitch.com%7C39ec738f9a1342de48ef08d8271336fd%7C9d9e56ebf6134ddbb27bbfcdf14b2cdb%7C1%7C0%7C637302310772139143&sdata=eK4GoJ2PszVg0Hn6qFh%2BFyUR5D7uY2c1TZXbY0lBTSA%3D&reserved=0>
  - --access-key=2CQ006Lereqpysbr0l0s
  - --secret=JM3S5Hd49Nz03eIbTTNnEyqcXJkIOXbp0gWIUEbp
  delta: '0:00:00.545895'
  end: '2020-07-09 14:21:11.516539'
  msg: non-zero return code
  rc: 13
  start: '2020-07-09 14:21:10.970644'
  stderr: |-
    request failed: (13) Permission denied
    If the realm has been changed on the master zone, the master zone's gateway may need to be restarted to recognize this user.
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Re-running the command manually reproduces the error. I understand that the permission denied error appears to indicate the keys are not valid, suggested by https://tracker.ceph.com/issues/36619<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F36619&data=02%7C01%7CAlex.Hussein-Kershaw%40metaswitch.com%7C39ec738f9a1342de48ef08d8271336fd%7C9d9e56ebf6134ddbb27bbfcdf14b2cdb%7C1%7C0%7C637302310772149131&sdata=GY0jsoNy91X%2F19ULYIZCfsk%2B2d%2BtrWX5wy8ERtuiE6k%3D&reserved=0>. However, I've triple checked the keys are correct on the other site. I'm at a loss of where to look for debugging, I've turned up logs on both the local and remote site for RGW and MON processes but neither seem to yield anything related. I've tried restarting everything as suggested in the error text from all the processes to a full reboot of all the VMs. I've no idea why the keys are being declined either, as they are correct (or atleast `radosgw-admin period get` on the primary site thinks so).

Thanks for your help,
Alex
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx