Re: "issue pool application warning even if pool is empty" change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 25, 2023 at 9:41 AM Prashant Dhange <pdhange@xxxxxxxxxx> wrote:
>
> Hi Ilya,
>
> G'day.
>
> We were seeing the rgw bucket creation failures if application is not
> enabled for the rgw control pool and ceph status was not reporting
> the warning message "x pool(s) do not have an application enabled
> (POOL_APP_NOT_ENABLED)".

Hi Prashant,

Could RGW be improved to emit a better log message in this case?

> We also observed the RGW daemon crash in the absence of application
> was not enabled for the pool. There was no way to know the reason
> behind RGW bucket creation failure. This issue has been raised on
> BZ#2029585.

I assume the crash is the following:

    debug     -5> 2022-08-10T12:10:55.410+0000 7f6b90b27700 10
monclient: get_auth_request con 0x5652391ac000 auth_method 0
    debug     -4> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
main: ERROR: notify_obj.operate() returned r=-1
    debug     -3> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 -1 ERROR:
failed to initialize watch: (1) Operation not permitted
    debug     -2> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
main: ERROR: failed to start notify service ((1) Operation not
permitted
    debug     -1> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
main: ERROR: failed to init services (ret=(1) Operation not permitted)
    debug      0> 2022-08-10T12:10:55.539+0000 7f6ba64b2440 -1 ***
Caught signal (Segmentation fault) **
     in thread 7f6ba64b2440 thread_name:radosgw

     ceph version 16.2.7-98.el8cp
(b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)
     1: /lib64/libpthread.so.0(+0x12c20) [0x7f6b9ab19c20]
     2: /lib64/librados.so.2(+0xada95) [0x7f6ba4ecaa95]
     3: /lib64/librados.so.2(+0x9dfd8) [0x7f6ba4ebafd8]
     4: (RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long)+0x2e)
[0x7f6ba5cac99e]
     5: (RGWSI_Notify::finalize_watch()+0x40) [0x7f6ba5cad290]
     6: (RGWSI_Notify::shutdown()+0x22) [0x7f6ba5cad302]
     7: (RGWServices_Def::shutdown()+0x4e) [0x7f6ba57abcde]
     8: (RGWServices_Def::~RGWServices_Def()+0x12) [0x7f6ba57abd62]
     9: (RGWRados::~RGWRados()+0x80) [0x7f6ba5b8e990]
     10: (RGWStoreManager::init_storage_provider(DoutPrefixProvider
const*, ceph::common::CephContext*, bool, bool, bool, bool, bool,
bool, bool)+0x137) [0x7f6ba5b8d277]
     11: (radosgw_Main(int, char const**)+0x154b) [0x7f6ba574a33b]
     12: __libc_start_main()
     13: _start()
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

It's not in RGW per se, but could be caused by RGW passing an invalid
pointer librados.  Was this reported to the RGW team?

>
> My opinion was that if we create a pool then we must specify the
> application for the pool even though the pool is not in use to avoid
> unnecessary creation of the pool.

As I said in the previous message, unfortunately it doesn't work this
way because creating a pool and specifying an application are separate
steps.  With this change the cluster can temporarily go to HEALTH_WARN
on any pool creation, even if operator is following up with "ceph osd
pool application enable" command immediately.  The "in use" check was
put in place because there appeared to be no other (easy) way to avoid
a bogus health alert.

> Let me know your thoughts.

Raising bogus health alerts is much worse than getting a legitimate
"can't create a bucket on a non-RGW pool" error, even if that failure
mode isn't obvious.  IMO this change should be reverted.

Thanks,

                Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux