On Fri, Aug 25, 2023 at 9:41 AM Prashant Dhange <pdhange@xxxxxxxxxx> wrote: > > Hi Ilya, > > G'day. > > We were seeing the rgw bucket creation failures if application is not > enabled for the rgw control pool and ceph status was not reporting > the warning message "x pool(s) do not have an application enabled > (POOL_APP_NOT_ENABLED)". Hi Prashant, Could RGW be improved to emit a better log message in this case? > We also observed the RGW daemon crash in the absence of application > was not enabled for the pool. There was no way to know the reason > behind RGW bucket creation failure. This issue has been raised on > BZ#2029585. I assume the crash is the following: debug -5> 2022-08-10T12:10:55.410+0000 7f6b90b27700 10 monclient: get_auth_request con 0x5652391ac000 auth_method 0 debug -4> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 0 rgw main: ERROR: notify_obj.operate() returned r=-1 debug -3> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 -1 ERROR: failed to initialize watch: (1) Operation not permitted debug -2> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 0 rgw main: ERROR: failed to start notify service ((1) Operation not permitted debug -1> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 0 rgw main: ERROR: failed to init services (ret=(1) Operation not permitted) debug 0> 2022-08-10T12:10:55.539+0000 7f6ba64b2440 -1 *** Caught signal (Segmentation fault) ** in thread 7f6ba64b2440 thread_name:radosgw ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable) 1: /lib64/libpthread.so.0(+0x12c20) [0x7f6b9ab19c20] 2: /lib64/librados.so.2(+0xada95) [0x7f6ba4ecaa95] 3: /lib64/librados.so.2(+0x9dfd8) [0x7f6ba4ebafd8] 4: (RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long)+0x2e) [0x7f6ba5cac99e] 5: (RGWSI_Notify::finalize_watch()+0x40) [0x7f6ba5cad290] 6: (RGWSI_Notify::shutdown()+0x22) [0x7f6ba5cad302] 7: (RGWServices_Def::shutdown()+0x4e) [0x7f6ba57abcde] 8: (RGWServices_Def::~RGWServices_Def()+0x12) [0x7f6ba57abd62] 9: (RGWRados::~RGWRados()+0x80) [0x7f6ba5b8e990] 10: (RGWStoreManager::init_storage_provider(DoutPrefixProvider const*, ceph::common::CephContext*, bool, bool, bool, bool, bool, bool, bool)+0x137) [0x7f6ba5b8d277] 11: (radosgw_Main(int, char const**)+0x154b) [0x7f6ba574a33b] 12: __libc_start_main() 13: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. It's not in RGW per se, but could be caused by RGW passing an invalid pointer librados. Was this reported to the RGW team? > > My opinion was that if we create a pool then we must specify the > application for the pool even though the pool is not in use to avoid > unnecessary creation of the pool. As I said in the previous message, unfortunately it doesn't work this way because creating a pool and specifying an application are separate steps. With this change the cluster can temporarily go to HEALTH_WARN on any pool creation, even if operator is following up with "ceph osd pool application enable" command immediately. The "in use" check was put in place because there appeared to be no other (easy) way to avoid a bogus health alert. > Let me know your thoughts. Raising bogus health alerts is much worse than getting a legitimate "can't create a bucket on a non-RGW pool" error, even if that failure mode isn't obvious. IMO this change should be reverted. Thanks, Ilya _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx