After looking through documentation soft log kills are "normal", however in radosgw logs we found: 2023-10-06T01:31:32.920+0200 7fb6f440b700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000002 to be held by another RGW process; skipping for now 2023-10-06T01:31:33.371+0200 7fb6f440b700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000004 to be held by another RGW process; skipping for now 2023-10-06T01:31:33.521+0200 7fb6f440b700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000006 to be held by another RGW process; skipping for now 2023-10-06T01:31:33.853+0200 7fb6f440b700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000008 to be held by another RGW process; skipping for now 2023-10-06T01:31:34.598+0200 7fb6f440b700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000012 to be held by another RGW process; skipping for now 2023-10-06T01:31:34.740+0200 7fb6f440b700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000014 to be held by another RGW process; skipping for now ... after this line ... it seems that rgw stopped responding. And the next day it stopped again almost at the same time 2023-10-07T01:27:26.299+0200 7f6216651700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000005 to be held by another RGW process; skipping for now 2023-10-07T01:37:28.077+0200 7f6216651700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000014 to be held by another RGW process; skipping for now 2023-10-07T01:47:27.333+0200 7f6216651700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000001 to be held by another RGW process; skipping for now 2023-10-07T02:47:29.863+0200 7f6216651700 0 INFO: RGWReshardLock::lock found lock on reshard.0000000006 to be held by another RGW process; skipping for now ... after this line ... rgw stopped responding. We had to restart it. We were just about to upgrade to ceph 17.x... but we had postpone it because of this. Rok On Fri, Oct 6, 2023 at 9:30 AM Rok Jaklič <rjaklic@xxxxxxxxx> wrote: > Hi, > > yesterday we changed RGW from civetweb to beast and at 04:02 RGW stopped > working; we had to restart it in the morning. > > In one rgw log for previous day we can see: > 2023-10-06T04:02:01.105+0200 7fb71d45d700 -1 received signal: Hangup from > killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw > rbd-mirror cephfs-mirror (PID: 3202663) UID: 0 > and in the next day log we can see: > 2023-10-06T04:02:01.133+0200 7fb71d45d700 -1 received signal: Hangup from > (PID: 3202664) UID: 0 > > and after that no requests came. We had to restart rgw. > > In ceph.conf we have something like > > [client.radosgw.ctplmon2] > host = ctplmon2 > log_file = /var/log/ceph/client.radosgw.ctplmon2.log > rgw_dns_name = ctplmon2 > rgw_frontends = "beast ssl_endpoint=0.0.0.0:4443 ssl_certificate=..." > rgw_max_put_param_size = 15728640 > > We assume it has something to do with logrotate. > > /etc/logrotate.d/ceph: > /var/log/ceph/*.log { > rotate 90 > daily > compress > sharedscripts > postrotate > killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse > radosgw rbd-mirror cephfs-mirror || pkill -1 -x > "ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw|rbd-mirror|cephfs-mirror" > || true > endscript > missingok > notifempty > su root ceph > } > > ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific > (stable) > > And ideas why this happend? > > Kind regards, > Rok > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx