Re: iSCSI Gateway reboots and permanent loss

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Thu, 5 Dec 2019 16:23:13 -0500

Thats great thank you so much. I will try and get this patch in my test env asap but will likely wait for official release cut for prod. I really appreciate you adding this in to the product. 

Respectfully,

Wes Dillinghamwes@xxxxxxxxxxxxxxxxx
LinkedIn

On Thu, Dec 5, 2019 at 4:14 PM Mike Christie <mchristi@xxxxxxxxxx> wrote:
On 12/04/2019 02:34 PM, Wesley Dillingham wrote:

> I have never had a permanent loss of a gateway but I'm a believer in

> Murphy's law and want to have a plan. Glad to hear that there is a

> solution in-the-works, curious when might that be available in a

> release? If sooner than later I'll plan to upgrade then immediately,

It should be in the next release which I think we would just make when

the patch gets merged since we have a good number of fixes sitting in

the repo.

Patch/PR is here

https://github.com/ceph/ceph-iscsi/pull/156

if you have a non production setup and are used to applying patches and

testing upstream.

> otherwise, if far down the queue I would like to know if I should ready

> a standby server.

> 

>  Thanks so much for all your great work on this product.

> 

> 

> Respectfully,

> 

> *Wes Dillingham*

> wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>

> LinkedIn <http://www.linkedin.com/in/wesleydillingham>

> 

> 

> On Wed, Dec 4, 2019 at 11:18 AM Mike Christie <mchristi@xxxxxxxxxx

> <mailto:mchristi@xxxxxxxxxx>> wrote:

> 

>     On 12/04/2019 08:26 AM, Gesiel Galvão Bernardes wrote:

>     > Hi,

>     >

>     > Em qua., 4 de dez. de 2019 às 00:31, Mike Christie

>     <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>

>     > <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>> escreveu:

>     >

>     >     On 12/03/2019 04:19 PM, Wesley Dillingham wrote:

>     >     > Thanks. If I am reading this correctly the ability to remove

>     an iSCSI

>     >     > gateway would allow the remaining iSCSI gateways to take

>     over for the

>     >     > removed gateway's LUN's as of > 3.0. Thats good, we run 3.2.

>     However,

>     >     > because the actual update of the central config object happens

>     >     from the

>     >     > to-be-deleted iSCSI gateway, despite where the gwcli command is

>     >     issued,

>     >     > it will fail to actually remove said gateway from the object

>     if that

>     >     > gateway is not functioning.

>     >

>     >     Yes.

>     >

>     >     >

>     >     > I guess this leaves the question still of how to proceed

>     when one

>     >     of the

>     >     > iSCSI gateways fails permanently?  Is that possible, or is it

>     >     > potentially possible other than manually intervening on the

>     config

>     >

>     >     You could edit the gateway.cfg manually, but I would not do

>     it, because

>     >     it's error prone.

>     >

>     >     It's probably safest to run in degraded mode and wait for an

>     updated

>     >     ceph-iscsi package with a fix. If you are running into the

>     problem right

>     >     now, I can bump the priority.

>     >

>     > I permanently lost a gateway. I can not leave running "degraded"

>     because

>     > I need to add another redundancy gateway, and it does not allow

>     with the

>     > gateway "offline".

>     >

>     > In this case, what can I do? If I create a new gateway with the same

>     > name and IP as the lost one, and then try to use "delete" in

>     gwcli, will

>     > it work?

> 

>     Yes.

> 

>     If you can have a temp stop in services you can also do the following as

>     a workaround:

> 

>     0. Stop applications accessing iscsi luns, and have the initiator log

>     out of the iscsi target.

> 

>     1. Stop ceph iscsi service. On all iscsi gw nodes do:

> 

>     systemctl stop rbd-target-api

> 

>     2. Delete gateway.cfg. This will delete the configuration info like the

>     target and its ACL and LUN mappings. It does not delete the actual

>     images or pools that you have data on.

> 

>     rados -p rbd rm gateway.cfg

> 

>     3. Start ceph iscsi services again. On all iscsi gw nodes do:

> 

>     systemctl start rbd-target-api

> 

>     4. Resetup target with gwcli. For the image/disk setup stage, instead of

>     doing the "create" command do the "attach"command:

> 

>     attach pool=your_pool image=image_name

> 

>     Then just re-add your target, ACLs and LUN mappings.

> 

>     5. On the initiator side relogin to the iscsi target.

> 

> 

>     > 

>     > 

>     >

>     >     > object? If its not possible would the best course of action

>     be to have

>     >     > standby hardware and quickly recreate the node or perhaps

>     run the

>     >     > gateways more ephemerally, from a VM or container?

>     >     >

>     >     > Thanks again.

>     >     >

>     >     > Respectfully,

>     >     >

>     >     > *Wes Dillingham*

>     >     > wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>

>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>

>     >     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>

>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>>

>     >     > LinkedIn <http://www.linkedin.com/in/wesleydillingham>

>     >     >

>     >     >

>     >     > On Tue, Dec 3, 2019 at 2:45 PM Mike Christie

>     <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>

>     >     <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>

>     >     > <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>

>     <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>>> wrote:

>     >     >

>     >     >     I do not think it's going to do what you want when the node

>     >     you want to

>     >     >     delete is down.

>     >     >

>     >     >     It looks like we only temporarily stop the gw from being

>     >     exported. It

>     >     >     does not update the gateway.cfg, because we do the config

>     >     removal call

>     >     >     on the node we want to delete.

>     >     >

>     >     >     So gwcli would report success and the ls command will

>     show it

>     >     as no

>     >     >     longer running/exported, but if you restart the

>     rbd-target-api

>     >     service

>     >     >     then it will show up again.

>     >     >

>     >     >     There is an internal command to do what you want. I will

>     post

>     >     a PR for

>     >     >     gwlci and so it can be used by dashboard.

>     >     >

>     >     >

>     >     >     On 12/03/2019 01:19 PM, Jason Dillaman wrote:

>     >     >     > If I recall correctly, the recent ceph-iscsi release

>     >     supports the

>     >     >     > removal of a gateway via the "gwcli". I think the Ceph

>     >     dashboard can

>     >     >     > do that as well.

>     >     >     >

>     >     >     > On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham

>     >     >     <wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>

>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>

>     >     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>

>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>>> wrote:

>     >     >     >>

>     >     >     >> We utilize 4 iSCSI gateways in a cluster and have

>     noticed the

>     >     >     following during patching cycles when we sequentially reboot

>     >     single

>     >     >     iSCSI-gateways:

>     >     >     >>

>     >     >     >> "gwcli" often hangs on the still-up iSCSI GWs but

>     sometimes

>     >     still

>     >     >     functions and gives the message:

>     >     >     >>

>     >     >     >> "1 gateway is inaccessible - updates will be disabled"

>     >     >     >>

>     >     >     >> This got me thinking about what the course of action

>     would be

>     >     >     should an iSCSI gateway fail permanently or

>     semi-permanently,

>     >     say a

>     >     >     hardware issue. What would be the best course of action to

>     >     instruct

>     >     >     the remaining iSCSI gateways that one of them is no longer

>     >     available

>     >     >     and that they should allow updates again and take

>     ownership of the

>     >     >     now-defunct-node's LUNS?

>     >     >     >>

>     >     >     >> I'm guessing pulling down the RADOS config object and

>     rewriting

>     >     >     it and re-put'ing it followed by a rbd-target-api

>     restart might do

>     >     >     the trick but am hoping there is a more "in-band" and less

>     >     >     potentially devastating way to do this.

>     >     >     >>

>     >     >     >> Thanks for any insights.

>     >     >     >>

>     >     >     >> Respectfully,

>     >     >     >>

>     >     >     >> Wes Dillingham

>     >     >     >> wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>

>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>

>     >     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>

>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>>

>     >     >     >> LinkedIn

>     >     >     >> _______________________________________________

>     >     >     >> ceph-users mailing list -- ceph-users@xxxxxxx

>     <mailto:ceph-users@xxxxxxx>

>     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>

>     >     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>

>     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>>

>     >     >     >> To unsubscribe send an email to

>     ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>

>     >     <mailto:ceph-users-leave@xxxxxxx

>     <mailto:ceph-users-leave@xxxxxxx>>

>     >     >     <mailto:ceph-users-leave@xxxxxxx

>     <mailto:ceph-users-leave@xxxxxxx>

>     >     <mailto:ceph-users-leave@xxxxxxx

>     <mailto:ceph-users-leave@xxxxxxx>>>

>     >     >     >

>     >     >     >

>     >     >     >

>     >     >

>     >     _______________________________________________

>     >     ceph-users mailing list -- ceph-users@xxxxxxx

>     <mailto:ceph-users@xxxxxxx>

>     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>

>     >     To unsubscribe send an email to ceph-users-leave@xxxxxxx

>     <mailto:ceph-users-leave@xxxxxxx>

>     >     <mailto:ceph-users-leave@xxxxxxx

>     <mailto:ceph-users-leave@xxxxxxx>>

>     >

>     _______________________________________________

>     ceph-users mailing list -- ceph-users@xxxxxxx

>     <mailto:ceph-users@xxxxxxx>

>     To unsubscribe send an email to ceph-users-leave@xxxxxxx

>     <mailto:ceph-users-leave@xxxxxxx>

> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx