Re: iSCSI Gateway reboots and permanent loss

Jason Dillaman <jdillama@xxxxxxxxxx> · Wed, 4 Dec 2019 09:29:14 -0500

On Wed, Dec 4, 2019 at 9:27 AM Gesiel Galvão Bernardes
<gesiel.bernardes@xxxxxxxxx> wrote:
>
> Hi,
>
> Em qua., 4 de dez. de 2019 às 00:31, Mike Christie <mchristi@xxxxxxxxxx> escreveu:
>>
>> On 12/03/2019 04:19 PM, Wesley Dillingham wrote:
>> > Thanks. If I am reading this correctly the ability to remove an iSCSI
>> > gateway would allow the remaining iSCSI gateways to take over for the
>> > removed gateway's LUN's as of > 3.0. Thats good, we run 3.2. However,
>> > because the actual update of the central config object happens from the
>> > to-be-deleted iSCSI gateway, despite where the gwcli command is issued,
>> > it will fail to actually remove said gateway from the object if that
>> > gateway is not functioning.
>>
>> Yes.
>>
>> >
>> > I guess this leaves the question still of how to proceed when one of the
>> > iSCSI gateways fails permanently?  Is that possible, or is it
>> > potentially possible other than manually intervening on the config
>>
>> You could edit the gateway.cfg manually, but I would not do it, because
>> it's error prone.
>>
>> It's probably safest to run in degraded mode and wait for an updated
>> ceph-iscsi package with a fix. If you are running into the problem right
>> now, I can bump the priority.
>>
> I permanently lost a gateway. I can not leave running "degraded" because I need to add another redundancy gateway, and it does not allow with the gateway "offline".
>
> In this case, what can I do? If I create a new gateway with the same name and IP as the lost one, and then try to use "delete" in gwcli, will it work?

Yes, it will work if they are configured w/ the same name + IPs and
you re-install the ceph-iscsi SW.

>
>>
>> > object? If its not possible would the best course of action be to have
>> > standby hardware and quickly recreate the node or perhaps run the
>> > gateways more ephemerally, from a VM or container?
>> >
>> > Thanks again.
>> >
>> > Respectfully,
>> >
>> > *Wes Dillingham*
>> > wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> >
>> >
>> > On Tue, Dec 3, 2019 at 2:45 PM Mike Christie <mchristi@xxxxxxxxxx
>> > <mailto:mchristi@xxxxxxxxxx>> wrote:
>> >
>> >     I do not think it's going to do what you want when the node you want to
>> >     delete is down.
>> >
>> >     It looks like we only temporarily stop the gw from being exported. It
>> >     does not update the gateway.cfg, because we do the config removal call
>> >     on the node we want to delete.
>> >
>> >     So gwcli would report success and the ls command will show it as no
>> >     longer running/exported, but if you restart the rbd-target-api service
>> >     then it will show up again.
>> >
>> >     There is an internal command to do what you want. I will post a PR for
>> >     gwlci and so it can be used by dashboard.
>> >
>> >
>> >     On 12/03/2019 01:19 PM, Jason Dillaman wrote:
>> >     > If I recall correctly, the recent ceph-iscsi release supports the
>> >     > removal of a gateway via the "gwcli". I think the Ceph dashboard can
>> >     > do that as well.
>> >     >
>> >     > On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham
>> >     <wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>> wrote:
>> >     >>
>> >     >> We utilize 4 iSCSI gateways in a cluster and have noticed the
>> >     following during patching cycles when we sequentially reboot single
>> >     iSCSI-gateways:
>> >     >>
>> >     >> "gwcli" often hangs on the still-up iSCSI GWs but sometimes still
>> >     functions and gives the message:
>> >     >>
>> >     >> "1 gateway is inaccessible - updates will be disabled"
>> >     >>
>> >     >> This got me thinking about what the course of action would be
>> >     should an iSCSI gateway fail permanently or semi-permanently, say a
>> >     hardware issue. What would be the best course of action to instruct
>> >     the remaining iSCSI gateways that one of them is no longer available
>> >     and that they should allow updates again and take ownership of the
>> >     now-defunct-node's LUNS?
>> >     >>
>> >     >> I'm guessing pulling down the RADOS config object and rewriting
>> >     it and re-put'ing it followed by a rbd-target-api restart might do
>> >     the trick but am hoping there is a more "in-band" and less
>> >     potentially devastating way to do this.
>> >     >>
>> >     >> Thanks for any insights.
>> >     >>
>> >     >> Respectfully,
>> >     >>
>> >     >> Wes Dillingham
>> >     >> wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
>> >     >> LinkedIn
>> >     >> _______________________________________________
>> >     >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >     <mailto:ceph-users@xxxxxxx>
>> >     >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >     <mailto:ceph-users-leave@xxxxxxx>
>> >     >
>> >     >
>> >     >
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Jason
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx