On Mon, Aug 28, 2023 at 11:22 PM Prashant Dhange <pdhange@xxxxxxxxxx> wrote: > > Hi Ilya and Vikhyat, > > On Mon, Aug 28, 2023 at 9:06 AM Vikhyat Umrao <vikhyat@xxxxxxxxxx> wrote: >> >> Ilya and Prashant - Or it could be we can have a feature in rados when the pool create command run should also take the application as input? This app not being set up has caused hard problems in troubleshooting. > > This could be an alternative approach to avoid BZ#2029585 issue by specifying application name at the time of pool creation. Let's discuss this in the next RADOS meeting. > >> >> >> On Fri, Aug 25, 2023 at 3:21 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>> >>> On Fri, Aug 25, 2023 at 9:41 AM Prashant Dhange <pdhange@xxxxxxxxxx> wrote: >>> > >>> > Hi Ilya, >>> > >>> > G'day. >>> > >>> > We were seeing the rgw bucket creation failures if application is not >>> > enabled for the rgw control pool and ceph status was not reporting >>> > the warning message "x pool(s) do not have an application enabled >>> > (POOL_APP_NOT_ENABLED)". >>> >>> Hi Prashant, >>> >>> Could RGW be improved to emit a better log message in this case? >>> >>> > We also observed the RGW daemon crash in the absence of application >>> > was not enabled for the pool. There was no way to know the reason >>> > behind RGW bucket creation failure. This issue has been raised on >>> > BZ#2029585. >>> >>> I assume the crash is the following: >>> >>> debug -5> 2022-08-10T12:10:55.410+0000 7f6b90b27700 10 >>> monclient: get_auth_request con 0x5652391ac000 auth_method 0 >>> debug -4> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 0 rgw >>> main: ERROR: notify_obj.operate() returned r=-1 >>> debug -3> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 -1 ERROR: >>> failed to initialize watch: (1) Operation not permitted >>> debug -2> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 0 rgw >>> main: ERROR: failed to start notify service ((1) Operation not >>> permitted >>> debug -1> 2022-08-10T12:10:55.532+0000 7f6ba64b2440 0 rgw >>> main: ERROR: failed to init services (ret=(1) Operation not permitted) >>> debug 0> 2022-08-10T12:10:55.539+0000 7f6ba64b2440 -1 *** >>> Caught signal (Segmentation fault) ** >>> in thread 7f6ba64b2440 thread_name:radosgw >>> >>> ceph version 16.2.7-98.el8cp >>> (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable) >>> 1: /lib64/libpthread.so.0(+0x12c20) [0x7f6b9ab19c20] >>> 2: /lib64/librados.so.2(+0xada95) [0x7f6ba4ecaa95] >>> 3: /lib64/librados.so.2(+0x9dfd8) [0x7f6ba4ebafd8] >>> 4: (RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long)+0x2e) >>> [0x7f6ba5cac99e] >>> 5: (RGWSI_Notify::finalize_watch()+0x40) [0x7f6ba5cad290] >>> 6: (RGWSI_Notify::shutdown()+0x22) [0x7f6ba5cad302] >>> 7: (RGWServices_Def::shutdown()+0x4e) [0x7f6ba57abcde] >>> 8: (RGWServices_Def::~RGWServices_Def()+0x12) [0x7f6ba57abd62] >>> 9: (RGWRados::~RGWRados()+0x80) [0x7f6ba5b8e990] >>> 10: (RGWStoreManager::init_storage_provider(DoutPrefixProvider >>> const*, ceph::common::CephContext*, bool, bool, bool, bool, bool, >>> bool, bool)+0x137) [0x7f6ba5b8d277] >>> 11: (radosgw_Main(int, char const**)+0x154b) [0x7f6ba574a33b] >>> 12: __libc_start_main() >>> 13: _start() >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> It's not in RGW per se, but could be caused by RGW passing an invalid >>> pointer librados. Was this reported to the RGW team? > > Yes, this was the RGW crash. We had this reported in the tracker#54719. I was not able to debug this issue further due to missing coredump and > also not able to reproduce it again on another attempt. > >>> >>> > >>> > My opinion was that if we create a pool then we must specify the >>> > application for the pool even though the pool is not in use to avoid >>> > unnecessary creation of the pool. >>> >>> As I said in the previous message, unfortunately it doesn't work this >>> way because creating a pool and specifying an application are separate >>> steps. With this change the cluster can temporarily go to HEALTH_WARN >>> on any pool creation, even if operator is following up with "ceph osd >>> pool application enable" command immediately. The "in use" check was >>> put in place because there appeared to be no other (easy) way to avoid >>> a bogus health alert. > > Would it be a good approach in your view to compulsory specify application name at the time of pool > creation as suggested by Vikhyat ? Hi Prashant, I'm pretty sure there were $REASONS why it wasn't made compulsory back when support for application tags/metadata was being added. The major one was definitely backwards compatibility, since changing an existing monitor command to require a parameter that isn't even there is tough. Further, you would need to extend librados C/C++ API and also all bindings because none of the rados_pool_create variants allow passing arbitrary parameters. Overall, it doesn't seem worth the effort (and trouble) to me. Thanks, Ilya _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx