Re: RGW crashes when rgw_enable_ops_log is enabled

Matt Benjamin <mbenjami@xxxxxxxxxx> · Thu, 25 Jan 2024 10:22:37 -0500

Hi Marc,

The ops log code is designed to discard data if the socket is
flow-controlled, iirc.  Maybe we just need to handle the signal.

Of course, you should have something consuming data on the socket, but it's
still a problem if radosgw exits unexpectedly.

Matt

On Thu, Jan 25, 2024 at 10:08 AM Marc Singer <marc@singer.services> wrote:

> Hi Ceph Users
>
> I am encountering a problem with the RGW Admin Ops Socket.
>
> I am setting up the socket as follows:
>
> rgw_enable_ops_log = true
> rgw_ops_log_socket_path = /tmp/ops/rgw-ops.socket
> rgw_ops_log_data_backlog = 16Mi
>
> Seems like the socket fills up over time and it doesn't seem to get
> flushed, at some point the process runs out of file space.
>
> Do I need to configure something or send something for the socket to flush?
>
> See the log here:
>
> 0> 2024-01-25T13:10:13.908+0000 7f247b00eb00 -1 *** Caught signal (File
> size limit exceeded) **
>   in thread 7f247b00eb00 thread_name:ops_log_file
>
>   ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef
> (stable)
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>     0/ 5 none
>     0/ 1 lockdep
>     0/ 1 context
>     1/ 1 crush
>     1/ 5 mds
>     1/ 5 mds_balancer
>     1/ 5 mds_locker
>     1/ 5 mds_log
>     1/ 5 mds_log_expire
>     1/ 5 mds_migrator
>     0/ 1 buffer
>     0/ 1 timer
>     0/ 1 filer
>     0/ 1 striper
>     0/ 1 objecter
>     0/ 5 rados
>     0/ 5 rbd
>     0/ 5 rbd_mirror
>     0/ 5 rbd_replay
>     0/ 5 rbd_pwl
>     0/ 5 journaler
>     0/ 5 objectcacher
>     0/ 5 immutable_obj_cache
>     0/ 5 client
>     1/ 5 osd
>     0/ 5 optracker
>     0/ 5 objclass
>     1/ 3 filestore
>     1/ 3 journal
>     0/ 0 ms
>     1/ 5 mon
>     0/10 monc
>     1/ 5 paxos
>     0/ 5 tp
>     1/ 5 auth
>     1/ 5 crypto
>     1/ 1 finisher
>     1/ 1 reserver
>     1/ 5 heartbeatmap
>     1/ 5 perfcounter
>     1/ 5 rgw
>     1/ 5 rgw_sync
>     1/ 5 rgw_datacache
>     1/ 5 rgw_access
>     1/ 5 rgw_dbstore
>     1/ 5 rgw_flight
>     1/ 5 javaclient
>     1/ 5 asok
>     1/ 1 throttle
>     0/ 0 refs
>     1/ 5 compressor
>     1/ 5 bluestore
>     1/ 5 bluefs
>     1/ 3 bdev
>     1/ 5 kstore
>     4/ 5 rocksdb
>     4/ 5 leveldb
>     1/ 5 fuse
>     2/ 5 mgr
>     1/ 5 mgrc
>     1/ 5 dpdk
>     1/ 5 eventtrace
>     1/ 5 prioritycache
>     0/ 5 test
>     0/ 5 cephfs_mirror
>     0/ 5 cephsqlite
>     0/ 5 seastore
>     0/ 5 seastore_onode
>     0/ 5 seastore_odata
>     0/ 5 seastore_omap
>     0/ 5 seastore_tm
>     0/ 5 seastore_t
>     0/ 5 seastore_cleaner
>     0/ 5 seastore_epm
>     0/ 5 seastore_lba
>     0/ 5 seastore_fixedkv_tree
>     0/ 5 seastore_cache
>     0/ 5 seastore_journal
>     0/ 5 seastore_device
>     0/ 5 seastore_backref
>     0/ 5 alienstore
>     1/ 5 mclock
>     0/ 5 cyanstore
>     1/ 5 ceph_exporter
>     1/ 5 memstore
>    -2/-2 (syslog threshold)
>    99/99 (stderr threshold)
> --- pthread ID / name mapping for recent threads ---
>    7f2472a89b00 / safe_timer
>    7f2472cadb00 / radosgw
>    ...
>    log_file
>
> /var/lib/ceph/crash/2024-01-25T13:10:13.909546Z_01ee6e6a-e946-4006-9d32-e17ef2f9df74/log
> --- end dump of recent events ---
> reraise_fatal: default handler for signal 25 didn't terminate the process?
>
> Thank you for your help.
>
> Marc
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx