Hi
The issue is open: https://tracker.ceph.com/issues/64244
If you could take a look or let me know what are the next steps I would
be super grateful.
In the meantime I will try to increase the read throughput.
Thanks
Marc
On 1/26/24 15:23, Matt Benjamin wrote:
Hi Marc,
1. if you can, yes, create a tracker issue on tracker.ceph.com?
2. you might be able to get more throughput with (some number) of
additional threads; the first thing I would try is prioritization (nice)
regards,
Matt
On Fri, Jan 26, 2024 at 6:08 AM Marc Singer<marc@singer.services> wrote:
Hi Matt
Thanks for your answer.
Should I open a bug report then?
How would I be able to read more from it? Have multiple threads access
it and read from it simultaneously?
Marc
On 1/25/24 20:25, Matt Benjamin wrote:
Hi Marc,
No, the only thing you need to do with the Unix socket is to keep
reading from it. So it probably is getting backlogged. And while you
could arrange things to make that less likely, you likely can't make
it impossible, so there's a bug here.
Matt
On Thu, Jan 25, 2024 at 10:52 AM Marc Singer<marc@singer.services>
wrote:
Hi
I am using a unix socket client to connect with it and read the data
from it.
Do I need to do anything like signal the socket that this data has
been
read? Or am I not reading fast enough and data is backing up?
What I am also noticing that at some point (probably after something
with the ops socket happens), the log level seems to increase for
some
reason? I did not find anything in the logs yet why this would be
the case.
*Normal:*
2024-01-25T15:47:58.444+0000 7fe98a5c0b00 1 ====== starting new
request
req=0x7fe98712c720 =====
2024-01-25T15:47:58.548+0000 7fe98b700b00 1 ====== req done
req=0x7fe98712c720 op status=0 http_status=200
latency=0.104001537s ======
2024-01-25T15:47:58.548+0000 7fe98b700b00 1 beast: 0x7fe98712c720:
redacted - redacted [25/Jan/2024:15:47:58.444 +0000] "PUT
/redacted/redacted/chunks/27/27242/27242514_10_4194304 HTTP/1.1" 200
4194304 - "redacted" - latency=0.104001537s
*Close before crashing:
*
-509> 2024-01-25T14:54:31.588+0000 7f5186648b00 1 ====== starting
new request req=0x7f517ffca720 =====
-508> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000s initializing for trans_id =
tx0000023a42eb7515dcdc0-0065b27627-823feaa-central
-507> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000s getting op 1
-506> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj verifying requester
-505> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj normalizing buckets
and tenants
-504> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj init permissions
-503> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj recalculating target
-502> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj reading permissions
-501> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj init op
-500> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj verifying op mask
-499> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj verifying op permissions
-498> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj Searching permissions for
identity=rgw::auth::SysReqApplier ->
rgw::auth::LocalApplier(acct_user=redacted, acct_name=redacted,
subuser=, perm_mask=15, is_admin=0) mask=50
-497> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj Searching permissions for
uid=redacted
-496> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj Found permission: 15
-495> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj Searching permissions for
group=1 mask=50
-494> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj Permissions for group
not found
-493> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj Searching permissions for
group=2 mask=50
-492> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj Permissions for group
not found
-491> 2024-01-25T14:54:31.588+0000 7f5186648b00 5 req
2568229052387020224 0.000000000ss3:put_obj -- Getting permissions
done
for identity=rgw::auth::SysReqApplier ->
rgw::auth::LocalApplier(acct_user=redacted, acct_name=redacted,
subuser=, perm_mask=15, is_admin=0), owner=redacted, perm=2
-490> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj verifying op params
-489> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj pre-executing
-488> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj check rate limiting
-487> 2024-01-25T14:54:31.588+0000 7f5186648b00 2 req
2568229052387020224 0.000000000ss3:put_obj executing
-486> 2024-01-25T14:54:31.624+0000 7f5183898b00 5 req
2568229052387020224 0.036000550ss3:put_obj NOTICE: call to
do_aws4_auth_completion
-485> 2024-01-25T14:54:31.624+0000 7f5183898b00 5 req
2568229052387020224 0.036000550ss3:put_obj NOTICE: call to
do_aws4_auth_completion
-484> 2024-01-25T14:54:31.680+0000 7f5185bc8b00 2 req
2568229052387020224 0.092001401ss3:put_obj completing
-483> 2024-01-25T14:54:31.680+0000 7f5185bc8b00 2 req
2568229052387020224 0.092001401ss3:put_obj op status=0
-482> 2024-01-25T14:54:31.680+0000 7f5185bc8b00 2 req
2568229052387020224 0.092001401ss3:put_obj http status=200
-481> 2024-01-25T14:54:31.680+0000 7f5185bc8b00 1 ====== req done
req=0x7f517ffca720 op status=0 http_status=200
latency=0.092001401s ======
Thanks for your help.
Marc Singer
On 1/25/24 16:22, Matt Benjamin wrote:
> Hi Marc,
>
> The ops log code is designed to discard data if the socket is
> flow-controlled, iirc. Maybe we just need to handle the signal.
>
> Of course, you should have something consuming data on the
socket, but it's
> still a problem if radosgw exits unexpectedly.
>
> Matt
>
> On Thu, Jan 25, 2024 at 10:08 AM Marc
Singer<marc@singer.services> wrote:
>
>> Hi Ceph Users
>>
>> I am encountering a problem with the RGW Admin Ops Socket.
>>
>> I am setting up the socket as follows:
>>
>> rgw_enable_ops_log = true
>> rgw_ops_log_socket_path = /tmp/ops/rgw-ops.socket
>> rgw_ops_log_data_backlog = 16Mi
>>
>> Seems like the socket fills up over time and it doesn't seem to
get
>> flushed, at some point the process runs out of file space.
>>
>> Do I need to configure something or send something for the
socket to flush?
>>
>> See the log here:
>>
>> 0> 2024-01-25T13:10:13.908+0000 7f247b00eb00 -1 *** Caught
signal (File
>> size limit exceeded) **
>> in thread 7f247b00eb00 thread_name:ops_log_file
>>
>> ceph version 18.2.0
(5dd24139a1eada541a3bc16b6941c5dde975e26d) reef
>> (stable)
>> NOTE: a copy of the executable, or `objdump -rdS
<executable>` is
>> needed to interpret this.
>>
>> --- logging levels ---
>> 0/ 5 none
>> 0/ 1 lockdep
>> 0/ 1 context
>> 1/ 1 crush
>> 1/ 5 mds
>> 1/ 5 mds_balancer
>> 1/ 5 mds_locker
>> 1/ 5 mds_log
>> 1/ 5 mds_log_expire
>> 1/ 5 mds_migrator
>> 0/ 1 buffer
>> 0/ 1 timer
>> 0/ 1 filer
>> 0/ 1 striper
>> 0/ 1 objecter
>> 0/ 5 rados
>> 0/ 5 rbd
>> 0/ 5 rbd_mirror
>> 0/ 5 rbd_replay
>> 0/ 5 rbd_pwl
>> 0/ 5 journaler
>> 0/ 5 objectcacher
>> 0/ 5 immutable_obj_cache
>> 0/ 5 client
>> 1/ 5 osd
>> 0/ 5 optracker
>> 0/ 5 objclass
>> 1/ 3 filestore
>> 1/ 3 journal
>> 0/ 0 ms
>> 1/ 5 mon
>> 0/10 monc
>> 1/ 5 paxos
>> 0/ 5 tp
>> 1/ 5 auth
>> 1/ 5 crypto
>> 1/ 1 finisher
>> 1/ 1 reserver
>> 1/ 5 heartbeatmap
>> 1/ 5 perfcounter
>> 1/ 5 rgw
>> 1/ 5 rgw_sync
>> 1/ 5 rgw_datacache
>> 1/ 5 rgw_access
>> 1/ 5 rgw_dbstore
>> 1/ 5 rgw_flight
>> 1/ 5 javaclient
>> 1/ 5 asok
>> 1/ 1 throttle
>> 0/ 0 refs
>> 1/ 5 compressor
>> 1/ 5 bluestore
>> 1/ 5 bluefs
>> 1/ 3 bdev
>> 1/ 5 kstore
>> 4/ 5 rocksdb
>> 4/ 5 leveldb
>> 1/ 5 fuse
>> 2/ 5 mgr
>> 1/ 5 mgrc
>> 1/ 5 dpdk
>> 1/ 5 eventtrace
>> 1/ 5 prioritycache
>> 0/ 5 test
>> 0/ 5 cephfs_mirror
>> 0/ 5 cephsqlite
>> 0/ 5 seastore
>> 0/ 5 seastore_onode
>> 0/ 5 seastore_odata
>> 0/ 5 seastore_omap
>> 0/ 5 seastore_tm
>> 0/ 5 seastore_t
>> 0/ 5 seastore_cleaner
>> 0/ 5 seastore_epm
>> 0/ 5 seastore_lba
>> 0/ 5 seastore_fixedkv_tree
>> 0/ 5 seastore_cache
>> 0/ 5 seastore_journal
>> 0/ 5 seastore_device
>> 0/ 5 seastore_backref
>> 0/ 5 alienstore
>> 1/ 5 mclock
>> 0/ 5 cyanstore
>> 1/ 5 ceph_exporter
>> 1/ 5 memstore
>> -2/-2 (syslog threshold)
>> 99/99 (stderr threshold)
>> --- pthread ID / name mapping for recent threads ---
>> 7f2472a89b00 / safe_timer
>> 7f2472cadb00 / radosgw
>> ...
>> log_file
>>
>>
/var/lib/ceph/crash/2024-01-25T13:10:13.909546Z_01ee6e6a-e946-4006-9d32-e17ef2f9df74/log
>> --- end dump of recent events ---
>> reraise_fatal: default handler for signal 25 didn't terminate
the process?
>>
>> Thank you for your help.
>>
>> Marc
>> _______________________________________________
>> ceph-users mailing list --ceph-users@xxxxxxx
>> To unsubscribe send an emailtoceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx