Re: "issue pool application warning even if pool is empty" change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 28, 2023 at 11:22 PM Prashant Dhange <pdhange@xxxxxxxxxx> wrote:
>
> Hi Ilya and Vikhyat,
>
> On Mon, Aug 28, 2023 at 9:06 AM Vikhyat Umrao <vikhyat@xxxxxxxxxx> wrote:
>>
>> Ilya and Prashant - Or it could be we can have a feature in rados when the pool create command run should also take the application as input? This app not being set up has caused hard problems in troubleshooting.
>
> This could be an alternative approach to avoid BZ#2029585 issue by specifying application name at the time of pool creation. Let's discuss this in the next RADOS meeting.
>
>>
>>
>> On Fri, Aug 25, 2023 at 3:21 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>>
>>> On Fri, Aug 25, 2023 at 9:41 AM Prashant Dhange <pdhange@xxxxxxxxxx> wrote:
>>> >
>>> > Hi Ilya,
>>> >
>>> > G'day.
>>> >
>>> > We were seeing the rgw bucket creation failures if application is not
>>> > enabled for the rgw control pool and ceph status was not reporting
>>> > the warning message "x pool(s) do not have an application enabled
>>> > (POOL_APP_NOT_ENABLED)".
>>>
>>> Hi Prashant,
>>>
>>> Could RGW be improved to emit a better log message in this case?
>>>
>>> > We also observed the RGW daemon crash in the absence of application
>>> > was not enabled for the pool. There was no way to know the reason
>>> > behind RGW bucket creation failure. This issue has been raised on
>>> > BZ#2029585.
>>>
>>> I assume the crash is the following:
>>>
>>>     debug     -5> 2022-08-10T12:10:55.410+0000 7f6b90b27700 10
>>> monclient: get_auth_request con 0x5652391ac000 auth_method 0
>>>     debug     -4> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
>>> main: ERROR: notify_obj.operate() returned r=-1
>>>     debug     -3> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 -1 ERROR:
>>> failed to initialize watch: (1) Operation not permitted
>>>     debug     -2> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
>>> main: ERROR: failed to start notify service ((1) Operation not
>>> permitted
>>>     debug     -1> 2022-08-10T12:10:55.532+0000 7f6ba64b2440  0 rgw
>>> main: ERROR: failed to init services (ret=(1) Operation not permitted)
>>>     debug      0> 2022-08-10T12:10:55.539+0000 7f6ba64b2440 -1 ***
>>> Caught signal (Segmentation fault) **
>>>      in thread 7f6ba64b2440 thread_name:radosgw
>>>
>>>      ceph version 16.2.7-98.el8cp
>>> (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)
>>>      1: /lib64/libpthread.so.0(+0x12c20) [0x7f6b9ab19c20]
>>>      2: /lib64/librados.so.2(+0xada95) [0x7f6ba4ecaa95]
>>>      3: /lib64/librados.so.2(+0x9dfd8) [0x7f6ba4ebafd8]
>>>      4: (RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long)+0x2e)
>>> [0x7f6ba5cac99e]
>>>      5: (RGWSI_Notify::finalize_watch()+0x40) [0x7f6ba5cad290]
>>>      6: (RGWSI_Notify::shutdown()+0x22) [0x7f6ba5cad302]
>>>      7: (RGWServices_Def::shutdown()+0x4e) [0x7f6ba57abcde]
>>>      8: (RGWServices_Def::~RGWServices_Def()+0x12) [0x7f6ba57abd62]
>>>      9: (RGWRados::~RGWRados()+0x80) [0x7f6ba5b8e990]
>>>      10: (RGWStoreManager::init_storage_provider(DoutPrefixProvider
>>> const*, ceph::common::CephContext*, bool, bool, bool, bool, bool,
>>> bool, bool)+0x137) [0x7f6ba5b8d277]
>>>      11: (radosgw_Main(int, char const**)+0x154b) [0x7f6ba574a33b]
>>>      12: __libc_start_main()
>>>      13: _start()
>>>      NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> It's not in RGW per se, but could be caused by RGW passing an invalid
>>> pointer librados.  Was this reported to the RGW team?
>
> Yes, this was the RGW crash. We had this reported in the tracker#54719. I was not able to debug this issue further due to missing coredump and
> also not able to reproduce it again on another attempt.
>
>>>
>>> >
>>> > My opinion was that if we create a pool then we must specify the
>>> > application for the pool even though the pool is not in use to avoid
>>> > unnecessary creation of the pool.
>>>
>>> As I said in the previous message, unfortunately it doesn't work this
>>> way because creating a pool and specifying an application are separate
>>> steps.  With this change the cluster can temporarily go to HEALTH_WARN
>>> on any pool creation, even if operator is following up with "ceph osd
>>> pool application enable" command immediately.  The "in use" check was
>>> put in place because there appeared to be no other (easy) way to avoid
>>> a bogus health alert.
>
> Would it be a good approach in your view to compulsory specify application name at the time of pool
> creation as suggested by Vikhyat ?

Hi Prashant,

I'm pretty sure there were $REASONS why it wasn't made compulsory back
when support for application tags/metadata was being added.  The major
one was definitely backwards compatibility, since changing an existing
monitor command to require a parameter that isn't even there is tough.

Further, you would need to extend librados C/C++ API and also all
bindings because none of the rados_pool_create variants allow passing
arbitrary parameters.

Overall, it doesn't seem worth the effort (and trouble) to me.

Thanks,

                Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux