Re: iSCSI Gateway reboots and permanent loss

Mike Christie <mchristi@xxxxxxxxxx> · Thu, 5 Dec 2019 15:14:46 -0600

On 12/04/2019 02:34 PM, Wesley Dillingham wrote:
> I have never had a permanent loss of a gateway but I'm a believer in
> Murphy's law and want to have a plan. Glad to hear that there is a
> solution in-the-works, curious when might that be available in a
> release? If sooner than later I'll plan to upgrade then immediately,

It should be in the next release which I think we would just make when
the patch gets merged since we have a good number of fixes sitting in
the repo.

Patch/PR is here

https://github.com/ceph/ceph-iscsi/pull/156

if you have a non production setup and are used to applying patches and
testing upstream.

> otherwise, if far down the queue I would like to know if I should ready
> a standby server.
> 
>  Thanks so much for all your great work on this product.
> 
> 
> Respectfully,
> 
> *Wes Dillingham*
> wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> 
> 
> On Wed, Dec 4, 2019 at 11:18 AM Mike Christie <mchristi@xxxxxxxxxx
> <mailto:mchristi@xxxxxxxxxx>> wrote:
> 
>     On 12/04/2019 08:26 AM, Gesiel Galvão Bernardes wrote:
>     > Hi,
>     >
>     > Em qua., 4 de dez. de 2019 às 00:31, Mike Christie
>     <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>
>     > <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>> escreveu:
>     >
>     >     On 12/03/2019 04:19 PM, Wesley Dillingham wrote:
>     >     > Thanks. If I am reading this correctly the ability to remove
>     an iSCSI
>     >     > gateway would allow the remaining iSCSI gateways to take
>     over for the
>     >     > removed gateway's LUN's as of > 3.0. Thats good, we run 3.2.
>     However,
>     >     > because the actual update of the central config object happens
>     >     from the
>     >     > to-be-deleted iSCSI gateway, despite where the gwcli command is
>     >     issued,
>     >     > it will fail to actually remove said gateway from the object
>     if that
>     >     > gateway is not functioning.
>     >
>     >     Yes.
>     >
>     >     >
>     >     > I guess this leaves the question still of how to proceed
>     when one
>     >     of the
>     >     > iSCSI gateways fails permanently?  Is that possible, or is it
>     >     > potentially possible other than manually intervening on the
>     config
>     >
>     >     You could edit the gateway.cfg manually, but I would not do
>     it, because
>     >     it's error prone.
>     >
>     >     It's probably safest to run in degraded mode and wait for an
>     updated
>     >     ceph-iscsi package with a fix. If you are running into the
>     problem right
>     >     now, I can bump the priority.
>     >
>     > I permanently lost a gateway. I can not leave running "degraded"
>     because
>     > I need to add another redundancy gateway, and it does not allow
>     with the
>     > gateway "offline".
>     >
>     > In this case, what can I do? If I create a new gateway with the same
>     > name and IP as the lost one, and then try to use "delete" in
>     gwcli, will
>     > it work?
> 
>     Yes.
> 
>     If you can have a temp stop in services you can also do the following as
>     a workaround:
> 
>     0. Stop applications accessing iscsi luns, and have the initiator log
>     out of the iscsi target.
> 
>     1. Stop ceph iscsi service. On all iscsi gw nodes do:
> 
>     systemctl stop rbd-target-api
> 
>     2. Delete gateway.cfg. This will delete the configuration info like the
>     target and its ACL and LUN mappings. It does not delete the actual
>     images or pools that you have data on.
> 
>     rados -p rbd rm gateway.cfg
> 
>     3. Start ceph iscsi services again. On all iscsi gw nodes do:
> 
>     systemctl start rbd-target-api
> 
>     4. Resetup target with gwcli. For the image/disk setup stage, instead of
>     doing the "create" command do the "attach"command:
> 
>     attach pool=your_pool image=image_name
> 
>     Then just re-add your target, ACLs and LUN mappings.
> 
>     5. On the initiator side relogin to the iscsi target.
> 
> 
>     > 
>     > 
>     >
>     >     > object? If its not possible would the best course of action
>     be to have
>     >     > standby hardware and quickly recreate the node or perhaps
>     run the
>     >     > gateways more ephemerally, from a VM or container?
>     >     >
>     >     > Thanks again.
>     >     >
>     >     > Respectfully,
>     >     >
>     >     > *Wes Dillingham*
>     >     > wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>
>     >     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>>
>     >     > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>     >     >
>     >     >
>     >     > On Tue, Dec 3, 2019 at 2:45 PM Mike Christie
>     <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>
>     >     <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>
>     >     > <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>
>     <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>>> wrote:
>     >     >
>     >     >     I do not think it's going to do what you want when the node
>     >     you want to
>     >     >     delete is down.
>     >     >
>     >     >     It looks like we only temporarily stop the gw from being
>     >     exported. It
>     >     >     does not update the gateway.cfg, because we do the config
>     >     removal call
>     >     >     on the node we want to delete.
>     >     >
>     >     >     So gwcli would report success and the ls command will
>     show it
>     >     as no
>     >     >     longer running/exported, but if you restart the
>     rbd-target-api
>     >     service
>     >     >     then it will show up again.
>     >     >
>     >     >     There is an internal command to do what you want. I will
>     post
>     >     a PR for
>     >     >     gwlci and so it can be used by dashboard.
>     >     >
>     >     >
>     >     >     On 12/03/2019 01:19 PM, Jason Dillaman wrote:
>     >     >     > If I recall correctly, the recent ceph-iscsi release
>     >     supports the
>     >     >     > removal of a gateway via the "gwcli". I think the Ceph
>     >     dashboard can
>     >     >     > do that as well.
>     >     >     >
>     >     >     > On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham
>     >     >     <wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>
>     >     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>>> wrote:
>     >     >     >>
>     >     >     >> We utilize 4 iSCSI gateways in a cluster and have
>     noticed the
>     >     >     following during patching cycles when we sequentially reboot
>     >     single
>     >     >     iSCSI-gateways:
>     >     >     >>
>     >     >     >> "gwcli" often hangs on the still-up iSCSI GWs but
>     sometimes
>     >     still
>     >     >     functions and gives the message:
>     >     >     >>
>     >     >     >> "1 gateway is inaccessible - updates will be disabled"
>     >     >     >>
>     >     >     >> This got me thinking about what the course of action
>     would be
>     >     >     should an iSCSI gateway fail permanently or
>     semi-permanently,
>     >     say a
>     >     >     hardware issue. What would be the best course of action to
>     >     instruct
>     >     >     the remaining iSCSI gateways that one of them is no longer
>     >     available
>     >     >     and that they should allow updates again and take
>     ownership of the
>     >     >     now-defunct-node's LUNS?
>     >     >     >>
>     >     >     >> I'm guessing pulling down the RADOS config object and
>     rewriting
>     >     >     it and re-put'ing it followed by a rbd-target-api
>     restart might do
>     >     >     the trick but am hoping there is a more "in-band" and less
>     >     >     potentially devastating way to do this.
>     >     >     >>
>     >     >     >> Thanks for any insights.
>     >     >     >>
>     >     >     >> Respectfully,
>     >     >     >>
>     >     >     >> Wes Dillingham
>     >     >     >> wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>
>     >     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>     <mailto:wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>>>
>     >     >     >> LinkedIn
>     >     >     >> _______________________________________________
>     >     >     >> ceph-users mailing list -- ceph-users@xxxxxxx
>     <mailto:ceph-users@xxxxxxx>
>     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
>     >     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>>
>     >     >     >> To unsubscribe send an email to
>     ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>     >     <mailto:ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>>
>     >     >     <mailto:ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>
>     >     <mailto:ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>>>
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >
>     >     _______________________________________________
>     >     ceph-users mailing list -- ceph-users@xxxxxxx
>     <mailto:ceph-users@xxxxxxx>
>     >     <mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
>     >     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>
>     >     <mailto:ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>>
>     >
>     _______________________________________________
>     ceph-users mailing list -- ceph-users@xxxxxxx
>     <mailto:ceph-users@xxxxxxx>
>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx