Re: iSCSI Gateway reboots and permanent loss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have never had a permanent loss of a gateway but I'm a believer in Murphy's law and want to have a plan. Glad to hear that there is a solution in-the-works, curious when might that be available in a release? If sooner than later I'll plan to upgrade then immediately, otherwise, if far down the queue I would like to know if I should ready a standby server.

 Thanks so much for all your great work on this product.


Respectfully,

Wes Dillingham


On Wed, Dec 4, 2019 at 11:18 AM Mike Christie <mchristi@xxxxxxxxxx> wrote:
On 12/04/2019 08:26 AM, Gesiel Galvão Bernardes wrote:
> Hi,
>
> Em qua., 4 de dez. de 2019 às 00:31, Mike Christie <mchristi@xxxxxxxxxx
> <mailto:mchristi@xxxxxxxxxx>> escreveu:
>
>     On 12/03/2019 04:19 PM, Wesley Dillingham wrote:
>     > Thanks. If I am reading this correctly the ability to remove an iSCSI
>     > gateway would allow the remaining iSCSI gateways to take over for the
>     > removed gateway's LUN's as of > 3.0. Thats good, we run 3.2. However,
>     > because the actual update of the central config object happens
>     from the
>     > to-be-deleted iSCSI gateway, despite where the gwcli command is
>     issued,
>     > it will fail to actually remove said gateway from the object if that
>     > gateway is not functioning.
>
>     Yes.
>
>     >
>     > I guess this leaves the question still of how to proceed when one
>     of the
>     > iSCSI gateways fails permanently?  Is that possible, or is it
>     > potentially possible other than manually intervening on the config
>
>     You could edit the gateway.cfg manually, but I would not do it, because
>     it's error prone.
>
>     It's probably safest to run in degraded mode and wait for an updated
>     ceph-iscsi package with a fix. If you are running into the problem right
>     now, I can bump the priority.
>
> I permanently lost a gateway. I can not leave running "degraded" because
> I need to add another redundancy gateway, and it does not allow with the
> gateway "offline".
>
> In this case, what can I do? If I create a new gateway with the same
> name and IP as the lost one, and then try to use "delete" in gwcli, will
> it work?

Yes.

If you can have a temp stop in services you can also do the following as
a workaround:

0. Stop applications accessing iscsi luns, and have the initiator log
out of the iscsi target.

1. Stop ceph iscsi service. On all iscsi gw nodes do:

systemctl stop rbd-target-api

2. Delete gateway.cfg. This will delete the configuration info like the
target and its ACL and LUN mappings. It does not delete the actual
images or pools that you have data on.

rados -p rbd rm gateway.cfg

3. Start ceph iscsi services again. On all iscsi gw nodes do:

systemctl start rbd-target-api

4. Resetup target with gwcli. For the image/disk setup stage, instead of
doing the "create" command do the "attach"command:

attach pool=your_pool image=image_name

Then just re-add your target, ACLs and LUN mappings.

5. On the initiator side relogin to the iscsi target.




>
>     > object? If its not possible would the best course of action be to have
>     > standby hardware and quickly recreate the node or perhaps run the
>     > gateways more ephemerally, from a VM or container?
>     >
>     > Thanks again.
>     >
>     > Respectfully,
>     >
>     > *Wes Dillingham*
>     > wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>
>     > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>     >
>     >
>     > On Tue, Dec 3, 2019 at 2:45 PM Mike Christie <mchristi@xxxxxxxxxx
>     <mailto:mchristi@xxxxxxxxxx>
>     > <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>> wrote:
>     >
>     >     I do not think it's going to do what you want when the node
>     you want to
>     >     delete is down.
>     >
>     >     It looks like we only temporarily stop the gw from being
>     exported. It
>     >     does not update the gateway.cfg, because we do the config
>     removal call
>     >     on the node we want to delete.
>     >
>     >     So gwcli would report success and the ls command will show it
>     as no
>     >     longer running/exported, but if you restart the rbd-target-api
>     service
>     >     then it will show up again.
>     >
>     >     There is an internal command to do what you want. I will post
>     a PR for
>     >     gwlci and so it can be used by dashboard.
>     >
>     >
>     >     On 12/03/2019 01:19 PM, Jason Dillaman wrote:
>     >     > If I recall correctly, the recent ceph-iscsi release
>     supports the
>     >     > removal of a gateway via the "gwcli". I think the Ceph
>     dashboard can
>     >     > do that as well.
>     >     >
>     >     > On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham
>     >     <wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>> wrote:
>     >     >>
>     >     >> We utilize 4 iSCSI gateways in a cluster and have noticed the
>     >     following during patching cycles when we sequentially reboot
>     single
>     >     iSCSI-gateways:
>     >     >>
>     >     >> "gwcli" often hangs on the still-up iSCSI GWs but sometimes
>     still
>     >     functions and gives the message:
>     >     >>
>     >     >> "1 gateway is inaccessible - updates will be disabled"
>     >     >>
>     >     >> This got me thinking about what the course of action would be
>     >     should an iSCSI gateway fail permanently or semi-permanently,
>     say a
>     >     hardware issue. What would be the best course of action to
>     instruct
>     >     the remaining iSCSI gateways that one of them is no longer
>     available
>     >     and that they should allow updates again and take ownership of the
>     >     now-defunct-node's LUNS?
>     >     >>
>     >     >> I'm guessing pulling down the RADOS config object and rewriting
>     >     it and re-put'ing it followed by a rbd-target-api restart might do
>     >     the trick but am hoping there is a more "in-band" and less
>     >     potentially devastating way to do this.
>     >     >>
>     >     >> Thanks for any insights.
>     >     >>
>     >     >> Respectfully,
>     >     >>
>     >     >> Wes Dillingham
>     >     >> wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>
>     >     >> LinkedIn
>     >     >> _______________________________________________
>     >     >> ceph-users mailing list -- ceph-users@xxxxxxx
>     <mailto:ceph-users@xxxxxxx>
>     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
>     >     >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>
>     >     <mailto:ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>>
>     >     >
>     >     >
>     >     >
>     >
>     _______________________________________________
>     ceph-users mailing list -- ceph-users@xxxxxxx
>     <mailto:ceph-users@xxxxxxx>
>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux