Re: RGW crashes when rgw_enable_ops_log is enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I am using a unix socket client to connect with it and read the data from it. Do I need to do anything like signal the socket that this data has been read? Or am I not reading fast enough and data is backing up?

What I am also noticing that at some point (probably after something with the ops socket happens), the log level seems to increase for some reason? I did not find anything in the logs yet why this would be the case.

*Normal:*

2024-01-25T15:47:58.444+0000 7fe98a5c0b00  1 ====== starting new request req=0x7fe98712c720 ===== 2024-01-25T15:47:58.548+0000 7fe98b700b00  1 ====== req done req=0x7fe98712c720 op status=0 http_status=200 latency=0.104001537s ====== 2024-01-25T15:47:58.548+0000 7fe98b700b00  1 beast: 0x7fe98712c720: redacted - redacted [25/Jan/2024:15:47:58.444 +0000] "PUT /redacted/redacted/chunks/27/27242/27242514_10_4194304 HTTP/1.1" 200 4194304 - "redacted" - latency=0.104001537s

*Close before crashing:
*

  -509> 2024-01-25T14:54:31.588+0000 7f5186648b00  1 ====== starting new request req=0x7f517ffca720 =====   -508> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s initializing for trans_id = tx0000023a42eb7515dcdc0-0065b27627-823feaa-central   -507> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s getting op 1   -506> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj verifying requester   -505> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj normalizing buckets and tenants   -504> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj init permissions   -503> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj recalculating target   -502> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj reading permissions   -501> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj init op   -500> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj verifying op mask   -499> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj verifying op permissions   -498> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj Searching permissions for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=redacted, acct_name=redacted, subuser=, perm_mask=15, is_admin=0) mask=50   -497> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj Searching permissions for uid=redacted   -496> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj Found permission: 15   -495> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj Searching permissions for group=1 mask=50   -494> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj Permissions for group not found   -493> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj Searching permissions for group=2 mask=50   -492> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj Permissions for group not found   -491> 2024-01-25T14:54:31.588+0000 7f5186648b00  5 req 2568229052387020224 0.000000000s s3:put_obj -- Getting permissions done for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=redacted, acct_name=redacted, subuser=, perm_mask=15, is_admin=0), owner=redacted, perm=2   -490> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj verifying op params   -489> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj pre-executing   -488> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj check rate limiting   -487> 2024-01-25T14:54:31.588+0000 7f5186648b00  2 req 2568229052387020224 0.000000000s s3:put_obj executing   -486> 2024-01-25T14:54:31.624+0000 7f5183898b00  5 req 2568229052387020224 0.036000550s s3:put_obj NOTICE: call to do_aws4_auth_completion   -485> 2024-01-25T14:54:31.624+0000 7f5183898b00  5 req 2568229052387020224 0.036000550s s3:put_obj NOTICE: call to do_aws4_auth_completion   -484> 2024-01-25T14:54:31.680+0000 7f5185bc8b00  2 req 2568229052387020224 0.092001401s s3:put_obj completing   -483> 2024-01-25T14:54:31.680+0000 7f5185bc8b00  2 req 2568229052387020224 0.092001401s s3:put_obj op status=0   -482> 2024-01-25T14:54:31.680+0000 7f5185bc8b00  2 req 2568229052387020224 0.092001401s s3:put_obj http status=200

  -481> 2024-01-25T14:54:31.680+0000 7f5185bc8b00  1 ====== req done req=0x7f517ffca720 op status=0 http_status=200 latency=0.092001401s ======

Thanks for your help.

Marc Singer

On 1/25/24 16:22, Matt Benjamin wrote:
Hi Marc,

The ops log code is designed to discard data if the socket is
flow-controlled, iirc.  Maybe we just need to handle the signal.

Of course, you should have something consuming data on the socket, but it's
still a problem if radosgw exits unexpectedly.

Matt

On Thu, Jan 25, 2024 at 10:08 AM Marc Singer<marc@singer.services>  wrote:

Hi Ceph Users

I am encountering a problem with the RGW Admin Ops Socket.

I am setting up the socket as follows:

rgw_enable_ops_log = true
rgw_ops_log_socket_path = /tmp/ops/rgw-ops.socket
rgw_ops_log_data_backlog = 16Mi

Seems like the socket fills up over time and it doesn't seem to get
flushed, at some point the process runs out of file space.

Do I need to configure something or send something for the socket to flush?

See the log here:

0> 2024-01-25T13:10:13.908+0000 7f247b00eb00 -1 *** Caught signal (File
size limit exceeded) **
   in thread 7f247b00eb00 thread_name:ops_log_file

   ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef
(stable)
   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
     0/ 5 none
     0/ 1 lockdep
     0/ 1 context
     1/ 1 crush
     1/ 5 mds
     1/ 5 mds_balancer
     1/ 5 mds_locker
     1/ 5 mds_log
     1/ 5 mds_log_expire
     1/ 5 mds_migrator
     0/ 1 buffer
     0/ 1 timer
     0/ 1 filer
     0/ 1 striper
     0/ 1 objecter
     0/ 5 rados
     0/ 5 rbd
     0/ 5 rbd_mirror
     0/ 5 rbd_replay
     0/ 5 rbd_pwl
     0/ 5 journaler
     0/ 5 objectcacher
     0/ 5 immutable_obj_cache
     0/ 5 client
     1/ 5 osd
     0/ 5 optracker
     0/ 5 objclass
     1/ 3 filestore
     1/ 3 journal
     0/ 0 ms
     1/ 5 mon
     0/10 monc
     1/ 5 paxos
     0/ 5 tp
     1/ 5 auth
     1/ 5 crypto
     1/ 1 finisher
     1/ 1 reserver
     1/ 5 heartbeatmap
     1/ 5 perfcounter
     1/ 5 rgw
     1/ 5 rgw_sync
     1/ 5 rgw_datacache
     1/ 5 rgw_access
     1/ 5 rgw_dbstore
     1/ 5 rgw_flight
     1/ 5 javaclient
     1/ 5 asok
     1/ 1 throttle
     0/ 0 refs
     1/ 5 compressor
     1/ 5 bluestore
     1/ 5 bluefs
     1/ 3 bdev
     1/ 5 kstore
     4/ 5 rocksdb
     4/ 5 leveldb
     1/ 5 fuse
     2/ 5 mgr
     1/ 5 mgrc
     1/ 5 dpdk
     1/ 5 eventtrace
     1/ 5 prioritycache
     0/ 5 test
     0/ 5 cephfs_mirror
     0/ 5 cephsqlite
     0/ 5 seastore
     0/ 5 seastore_onode
     0/ 5 seastore_odata
     0/ 5 seastore_omap
     0/ 5 seastore_tm
     0/ 5 seastore_t
     0/ 5 seastore_cleaner
     0/ 5 seastore_epm
     0/ 5 seastore_lba
     0/ 5 seastore_fixedkv_tree
     0/ 5 seastore_cache
     0/ 5 seastore_journal
     0/ 5 seastore_device
     0/ 5 seastore_backref
     0/ 5 alienstore
     1/ 5 mclock
     0/ 5 cyanstore
     1/ 5 ceph_exporter
     1/ 5 memstore
    -2/-2 (syslog threshold)
    99/99 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
    7f2472a89b00 / safe_timer
    7f2472cadb00 / radosgw
    ...
    log_file

/var/lib/ceph/crash/2024-01-25T13:10:13.909546Z_01ee6e6a-e946-4006-9d32-e17ef2f9df74/log
--- end dump of recent events ---
reraise_fatal: default handler for signal 25 didn't terminate the process?

Thank you for your help.

Marc
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux