Re: ceph-iscsi upgrade issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/10/2018 08:21 AM, Steven Vacaroaia wrote:
> Hi Jason,
> Thanks for your prompt responses 
> 
> I have used same iscsi-gateway.cfg file - no security changes - just
> added prometheus entry
> There is no iscsi-gateway.conf but the gateway.conf object is created
> and has correct entries
> 
> iscsi-gateway.cfg is identical and contains the following
> 
> [config]
> cluster_name = ceph
> gateway_keyring = ceph.client.admin.keyring
> api_secure = false
> trusted_ip_list =
> 10.10.30.181,10.10.30.182,10.10.30.183,10.10.30.184,10.10.30.185,10.10.30.186
> prometheus_host = 0.0.0.0
> 
> 
> 
> I am running the disks commands from OSD01 and they fail with the following
> 
> INFO [gateway.py:344:load_config()] - (Gateway.load_config) successfully
> loaded existing target definition
> 2018-10-10 09:04:48,956    DEBUG [gateway.py:423:map_luns()] -
> processing tpg2
> 2018-10-10 09:04:48,956    DEBUG [gateway.py:428:map_luns()] -
> rbd.dstest needed mapping to tpg2
> 2018-10-10 09:04:48,958     INFO
> [gateway.py:403:bind_alua_group_to_lun()] - Setup group ao for
> rbd.dstest on tpg 2 (state 0, owner True, failover type 1)
> 2018-10-10 09:04:48,958    DEBUG
> [gateway.py:405:bind_alua_group_to_lun()] - Setting Luns tg_pt_gp to ao
> 2018-10-10 09:04:48,959    DEBUG
> [gateway.py:409:bind_alua_group_to_lun()] - Bound rbd.dstest on tpg2 to ao
> 2018-10-10 09:04:48,959    DEBUG [gateway.py:423:map_luns()] -
> processing tpg1
> 2018-10-10 09:04:48,959    DEBUG [gateway.py:428:map_luns()] -
> rbd.dstest needed mapping to tpg1
> 2018-10-10 09:04:48,960     INFO
> [gateway.py:403:bind_alua_group_to_lun()] - Setup group ano1 for
> rbd.dstest on tpg 1 (state 1, owner False, failover type 1)
> 2018-10-10 09:04:48,960    DEBUG
> [gateway.py:405:bind_alua_group_to_lun()] - Setting Luns tg_pt_gp to ano1
> 2018-10-10 09:04:48,961    DEBUG
> [gateway.py:409:bind_alua_group_to_lun()] - Bound rbd.dstest on tpg1 to ano1
> 2018-10-10 09:04:48,963     INFO [_internal.py:87:_log()] - 127.0.0.1 -
> - [10/Oct/2018 09:04:48] "PUT /api/_disk/rbd.dstest HTTP/1.1" 200 -
> 2018-10-10 09:04:48,965     INFO [rbd-target-api:1804:call_api()] -
> _disk update on 127.0.0.1, successful
> 2018-10-10 09:04:48,965    DEBUG [rbd-target-api:1789:call_api()] -
> processing GW 'osd03'
> 2018-10-10 09:04:49,039    ERROR [rbd-target-api:1810:call_api()] -
> _disk change on osd03 failed with 500
> 2018-10-10 09:04:49,041     INFO [_internal.py:87:_log()] - 127.0.0.1 -
> - [10/Oct/2018 09:04:49] "PUT /api/disk/rbd.dstest HTTP/1.1" 500 -
> 
> 
> on OSD03 there is the folowing "error"
> 
>  INFO [lun.py:656:add_dev_to_lio()] - (LUN.add_dev_to_lio) Adding image
> 'rbd.dstest' to LIO
> 2018-10-10 09:04:49,037    DEBUG [lun.py:666:add_dev_to_lio()] -
> control="max_data_area_mb=8"
> 
> Amazingly enough, gwcli on OSD03 show the disk created but on OSD01 it
> does not 
> If I restart gwcli on OSD01 , disk is there but it cannot be added to
> the host because it image does not exist ???

What is the output of

systemctl status rbd-target-api
systemctl status rbd-target-gw

Is api in a failed state or does it indicate it has been crashing and
restarting?

Does /var/log/messages show that rbd-target-api is crashing and
restarting and could you attach the stack trace? The
/var/log/rbd-target-api log will show

Does

gwcli ls

show it cannot reach the remote gateways?


> 
> adding the disk to the hosts failed  with "client masking update" error 
> 
> disk add rbd.dstest
> CMD: ../hosts/<client_iqn> disk action=add disk=rbd.dstest
> Client 'iqn.1998-01.com.vmware:test-2d06960a' update - add disk rbd.dstest
> disk add for 'rbd.dstest' against iqn.1998-01.com.vmware:test-2d06960a
> failed
> client masking update failed on osd03. Client update failed
> 
> rbd-target-api:1216:_update_client()] - client update failed on
> iqn.1998-01.com.vmware:test-2d06960a : Non-existent images
> ['rbd.dstest'] requested for iqn.1998-01.com.vmware:test-2d06960a
> 
> However, the image is listed on gwcli and using rados ls 
> 
> /disks> ls
> o- disks
> ..........................................................................................................
> [150G, Disks: 1]
>   o- rbd.dstest
> ....................................................................................................
> [dstest (150G)]
> 
> rados -p rbd ls | grep dstest
> rbd_id.dstest
> 
> 
> 
> I would really appreciate any help / suggestions
> 
> Thanks
> Steven 
> 
> On Tue, 9 Oct 2018 at 16:35, Jason Dillaman <jdillama@xxxxxxxxxx
> <mailto:jdillama@xxxxxxxxxx>> wrote:
> 
>     Anything in the rbd-target-api.log on osd03 to indicate why it failed?
> 
>     Since you replaced your existing "iscsi-gateway.conf", do your
>     security settings still match between the two hosts (i.e. on the
>     trusted_ip_list, same api_XYZ options)?
>     On Tue, Oct 9, 2018 at 4:25 PM Steven Vacaroaia <stef97@xxxxxxxxx
>     <mailto:stef97@xxxxxxxxx>> wrote:
>     >
>     > so the gateways are up but I have issues adding disks ( i.e. if I
>     do it on one gatway it does not show on the other - however, after I
>     restart the rbd-target services I am seeing the disks )
>     > Thanks in advance for taking the trouble to provide advice / guidance
>     >
>     > 2018-10-09 16:16:08,968     INFO [rbd-target-api:1804:call_api()]
>     - _clientlun update on 127.0.0.1, successful
>     > 2018-10-09 16:16:08,968    DEBUG [rbd-target-api:1789:call_api()]
>     - processing GW 'osd03'
>     > 2018-10-09 16:16:08,987    ERROR [rbd-target-api:1810:call_api()]
>     - _clientlun change on osd03 failed with 500
>     > 2018-10-09 16:16:08,987    DEBUG [rbd-target-api:1827:call_api()]
>     - failed on osd03, applied to 127.0.0.1, aborted osd03. Client
>     update failed
>     > 2018-10-09 16:16:08,987     INFO [_internal.py:87:_log()] -
>     127.0.0.1 - - [09/Oct/2018 16:16:08] "PUT
>     /api/clientlun/iqn.1998-01.com.vmware:test-2d06960a HTTP/1.1" 500 -
>     >
>     > On Tue, 9 Oct 2018 at 15:42, Steven Vacaroaia <stef97@xxxxxxxxx
>     <mailto:stef97@xxxxxxxxx>> wrote:
>     >>
>     >> It worked.
>     >>
>     >> many thanks
>     >> Steven
>     >>
>     >> On Tue, 9 Oct 2018 at 15:36, Jason Dillaman <jdillama@xxxxxxxxxx
>     <mailto:jdillama@xxxxxxxxxx>> wrote:
>     >>>
>     >>> Can you try applying [1] and see if that resolves your issue?
>     >>>
>     >>> [1] https://github.com/ceph/ceph-iscsi-config/pull/78
>     >>> On Tue, Oct 9, 2018 at 3:06 PM Steven Vacaroaia
>     <stef97@xxxxxxxxx <mailto:stef97@xxxxxxxxx>> wrote:
>     >>> >
>     >>> > Thanks Jason
>     >>> >
>     >>> > adding prometheus_host = 0.0.0.0 to iscsi-gateway.cfg does not
>     work - the error message is
>     >>> >
>     >>> > "..rbd-target-gw: ValueError: invalid literal for int() with
>     base 10: '0.0.0.0' "
>     >>> >
>     >>> > adding prometheus_exporter = false works
>     >>> >
>     >>> > However I'd like to use prometheus_exporter if possible
>     >>> > Any suggestions will be appreciated
>     >>> >
>     >>> > Steven
>     >>> >
>     >>> >
>     >>> >
>     >>> > On Tue, 9 Oct 2018 at 14:25, Jason Dillaman
>     <jdillama@xxxxxxxxxx <mailto:jdillama@xxxxxxxxxx>> wrote:
>     >>> >>
>     >>> >> You can try adding "prometheus_exporter = false" in your
>     >>> >> "/etc/ceph/iscsi-gateway.cfg"'s "config" section if you
>     aren't using
>     >>> >> "cephmetrics", or try setting "prometheus_host = 0.0.0.0"
>     since it
>     >>> >> sounds like you have the IPv6 stack disabled.
>     >>> >>
>     >>> >> [1]
>     https://github.com/ceph/ceph-iscsi-config/blob/master/ceph_iscsi_config/settings.py#L90
>     >>> >> On Tue, Oct 9, 2018 at 2:09 PM Steven Vacaroaia
>     <stef97@xxxxxxxxx <mailto:stef97@xxxxxxxxx>> wrote:
>     >>> >> >
>     >>> >> > here is some info from /var/log/messages ..in case someone
>     has the time to take a look
>     >>> >> >
>     >>> >> > Oct  9 13:58:35 osd03 systemd: Started Setup system to
>     export rbd images through LIO.
>     >>> >> > Oct  9 13:58:35 osd03 systemd: Starting Setup system to
>     export rbd images through LIO...
>     >>> >> > Oct  9 13:58:35 osd03 journal: Processing osd blacklist
>     entries for this node
>     >>> >> > Oct  9 13:58:35 osd03 journal: No OSD blacklist entries found
>     >>> >> > Oct  9 13:58:35 osd03 journal: Reading the configuration
>     object to update local LIO configuration
>     >>> >> > Oct  9 13:58:35 osd03 journal: Configuration does not have
>     an entry for this host(osd03) - nothing to define to LIO
>     >>> >> > Oct  9 13:58:35 osd03 journal: Integrated Prometheus
>     exporter is enabled
>     >>> >> > Oct  9 13:58:35 osd03 journal: * Running on http://[::]:9287/
>     >>> >> > Oct  9 13:58:35 osd03 journal: Removing iSCSI target from LIO
>     >>> >> > Oct  9 13:58:35 osd03 journal: Removing LUNs from LIO
>     >>> >> > Oct  9 13:58:35 osd03 journal: Active Ceph iSCSI gateway
>     configuration removed
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: Traceback (most recent
>     call last):
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/bin/rbd-target-gw", line 5, in <module>
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw:
>     pkg_resources.run_script('ceph-iscsi-config==2.6', 'rbd-target-gw')
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/pkg_resources.py", line 540, in
>     run_script
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw:
>     self.require(requires)[0].run_script(script_name, ns)
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1462, in
>     run_script
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: exec_(script_code,
>     namespace, namespace)
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/pkg_resources.py", line 41, in exec_
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: exec("""exec code in
>     globs, locs""")
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File "<string>", line
>     1, in <module>
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/ceph_iscsi_config-2.6-py2.7.egg/EGG-INFO/scripts/rbd-target-gw",
>     line 432, in <module>
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/ceph_iscsi_config-2.6-py2.7.egg/EGG-INFO/scripts/rbd-target-gw",
>     line 379, in main
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/flask/app.py", line 772, in run
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: run_simple(host, port,
>     self, **options)
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 710, in
>     run_simple
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: inner()
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 692, in
>     inner
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: passthrough_errors,
>     ssl_context).serve_forever()
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 480, in
>     make_server
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: passthrough_errors,
>     ssl_context)
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib/python2.7/site-packages/werkzeug/serving.py", line 410, in
>     __init__
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw:
>     HTTPServer.__init__(self, (host, int(port)), handler)
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib64/python2.7/SocketServer.py", line 417, in __init__
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: self.socket_type)
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: File
>     "/usr/lib64/python2.7/socket.py", line 187, in __init__
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: _sock =
>     _realsocket(family, type, proto)
>     >>> >> > Oct  9 13:58:35 osd03 rbd-target-gw: socket.error: [Errno
>     97] Address family not supported by protocol
>     >>> >> > Oct  9 13:58:35 osd03 systemd: rbd-target-gw.service: main
>     process exited, code=exited, status=1/FAILURE
>     >>> >> >
>     >>> >> >
>     >>> >> > On Tue, 9 Oct 2018 at 13:16, Steven Vacaroaia
>     <stef97@xxxxxxxxx <mailto:stef97@xxxxxxxxx>> wrote:
>     >>> >> >>
>     >>> >> >> Hi ,
>     >>> >> >> I am using Mimic 13.2 and kernel 4.18
>     >>> >> >> Was using gwcli 2.5 and decided to upgrade to latest (2.7)
>     as people reported improved performance
>     >>> >> >>
>     >>> >> >> What is the proper methodology ?
>     >>> >> >> How should I troubleshoot this?
>     >>> >> >>
>     >>> >> >>
>     >>> >> >>
>     >>> >> >> What I did ( and it broke it) was
>     >>> >> >>
>     >>> >> >> cd tcmu-runner; git pull ; make && make install
>     >>> >> >> cd ceph-iscsi-cli; git pull;python setup.py install
>     >>> >> >> cd ceph-iscsi-config;git pull; python setup.py install
>     >>> >> >> cd rtslib-fb;git pull;  python setup.py install
>     >>> >> >>
>     >>> >> >> After a reboot, I cannot start rbd-target-gw and the logs
>     are not very helpful
>     >>> >> >>  ( Note:
>     >>> >> >>     I removed /etc/ceph/iscsi-gateway.cfg and gateway.conf
>     object as I wanted to start fresh
>     >>> >> >>      /etc/ceph/iscsi-gatway.conf was left unchanged )
>     >>> >> >>
>     >>> >> >>
>     >>> >> >> 2018-10-09 12:47:50,593 [    INFO] - Processing osd
>     blacklist entries for this node
>     >>> >> >> 2018-10-09 12:47:50,893 [    INFO] - No OSD blacklist
>     entries found
>     >>> >> >> 2018-10-09 12:47:50,893 [    INFO] - Reading the
>     configuration object to update local LIO configuration
>     >>> >> >> 2018-10-09 12:47:50,893 [    INFO] - Configuration does
>     not have an entry for this host(osd03) - nothing to define to LIO
>     >>> >> >> 2018-10-09 12:47:50,893 [    INFO] - Integrated Prometheus
>     exporter is enabled
>     >>> >> >> 2018-10-09 12:47:50,895 [    INFO] -  * Running on
>     http://[::]:9287/
>     >>> >> >> 2018-10-09 12:47:50,896 [    INFO] - Removing iSCSI target
>     from LIO
>     >>> >> >> 2018-10-09 12:47:50,896 [    INFO] - Removing LUNs from LIO
>     >>> >> >> 2018-10-09 12:47:50,896 [    INFO] - Active Ceph iSCSI
>     gateway configuration removed
>     >>> >> >>
>     >>> >> >> Many thanks
>     >>> >> >> Steven
>     >>> >> >>
>     >>> >> > _______________________________________________
>     >>> >> > ceph-users mailing list
>     >>> >> > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >>> >>
>     >>> >>
>     >>> >>
>     >>> >> --
>     >>> >> Jason
>     >>>
>     >>>
>     >>>
>     >>> --
>     >>> Jason
> 
> 
> 
>     -- 
>     Jason
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux