Re: "issue pool application warning even if pool is empty" change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ilya and Vikhyat,

On Mon, Aug 28, 2023 at 9:06 AM Vikhyat Umrao <vikhyat@xxxxxxxxxx> wrote:
Ilya and Prashant - Or it could be we can have a feature in rados when the pool create command run should also take the application as input? This app not being set up has caused hard problems in troubleshooting.
This could be an alternative approach to avoid BZ#2029585 issue by specifying application name at the time of pool creation. Let's discuss this in the next RADOS meeting.
 

On Fri, Aug 25, 2023 at 3:21 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Fri, Aug 25, 2023 at 9:41 AM Prashant Dhange <pdhange@xxxxxxxxxx> wrote:
>
> Hi Ilya,
>
> G'day.
>
> We were seeing the rgw bucket creation failures if application is not
> enabled for the rgw control pool and ceph status was not reporting
> the warning message "x pool(s) do not have an application enabled
> (POOL_APP_NOT_ENABLED)".

Hi Prashant,

Could RGW be improved to emit a better log message in this case?

> We also observed the RGW daemon crash in the absence of application
> was not enabled for the pool. There was no way to know the reason
> behind RGW bucket creation failure. This issue has been raised on
> BZ#2029585.

I assume the crash is the following:

    debug     -5> 2022-08-10T12:10:55.410+0000 7f6b90b27700 10
monclient: get_auth_request con 0x5652391ac000 auth_method 0
    debug     -4> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
main: ERROR: notify_obj.operate() returned r=-1
    debug     -3> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 -1 ERROR:
failed to initialize watch: (1) Operation not permitted
    debug     -2> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
main: ERROR: failed to start notify service ((1) Operation not
permitted
    debug     -1> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
main: ERROR: failed to init services (ret=(1) Operation not permitted)
    debug      0> 2022-08-10T12:10:55.539+0000 7f6ba64b2440 -1 ***
Caught signal (Segmentation fault) **
     in thread 7f6ba64b2440 thread_name:radosgw

     ceph version 16.2.7-98.el8cp
(b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)
     1: /lib64/libpthread.so.0(+0x12c20) [0x7f6b9ab19c20]
     2: /lib64/librados.so.2(+0xada95) [0x7f6ba4ecaa95]
     3: /lib64/librados.so.2(+0x9dfd8) [0x7f6ba4ebafd8]
     4: (RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long)+0x2e)
[0x7f6ba5cac99e]
     5: (RGWSI_Notify::finalize_watch()+0x40) [0x7f6ba5cad290]
     6: (RGWSI_Notify::shutdown()+0x22) [0x7f6ba5cad302]
     7: (RGWServices_Def::shutdown()+0x4e) [0x7f6ba57abcde]
     8: (RGWServices_Def::~RGWServices_Def()+0x12) [0x7f6ba57abd62]
     9: (RGWRados::~RGWRados()+0x80) [0x7f6ba5b8e990]
     10: (RGWStoreManager::init_storage_provider(DoutPrefixProvider
const*, ceph::common::CephContext*, bool, bool, bool, bool, bool,
bool, bool)+0x137) [0x7f6ba5b8d277]
     11: (radosgw_Main(int, char const**)+0x154b) [0x7f6ba574a33b]
     12: __libc_start_main()
     13: _start()
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

It's not in RGW per se, but could be caused by RGW passing an invalid
pointer librados.  Was this reported to the RGW team?
Yes, this was the RGW crash. We had this reported in the tracker#54719. I was not able to debug this issue further due to missing coredump and
also not able to reproduce it again on another attempt.


>
> My opinion was that if we create a pool then we must specify the
> application for the pool even though the pool is not in use to avoid
> unnecessary creation of the pool.

As I said in the previous message, unfortunately it doesn't work this
way because creating a pool and specifying an application are separate
steps.  With this change the cluster can temporarily go to HEALTH_WARN
on any pool creation, even if operator is following up with "ceph osd
pool application enable" command immediately.  The "in use" check was
put in place because there appeared to be no other (easy) way to avoid
a bogus health alert.
Would it be a good approach in your view to compulsory specify application name at the time of pool
creation as suggested by Vikhyat ?
 

> Let me know your thoughts.

Raising bogus health alerts is much worse than getting a legitimate
"can't create a bucket on a non-RGW pool" error, even if that failure
mode isn't obvious.  IMO this change should be reverted.
It does make sense to me. Let me discuss this with Neha and Radek. Thanks a lot for your inputs. Really appreciate it.

Thanks,

                Ilya


Regards,
Prashant
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux