RadosGW Log Rotation (firefly)

Daniel Schneller <daniel.schneller@xxxxxxxxxxxxxxxx> · Mon, 2 Mar 2015 17:44:32 +0100

On our Ubuntu 14.04/Firefly 0.80.8 cluster we are seeing
problem with log file rotation for the rados gateway.

The /etc/logrotate.d/radosgw script gets called, but
it does not work correctly. It spits out this message,
coming from the postrotate portion:

   /etc/cron.daily/logrotate:
   reload: Unknown parameter: id
   invoke-rc.d: initscript radosgw, action "reload" failed.

A new log file actually gets created, but due to the
failure in the post-rotate script, the daemon actually
continues writing into the now deleted previous file:

   [B|root@node01]  /etc/init ➜  ps aux | grep radosgw
   root     13077  0.9  0.1 13710396 203256 ?     Ssl  Feb14 212:27 
/usr/bin/radosgw -n client.radosgw.node01

   [B|root@node01]  /etc/init ➜  ls -l /proc/13077/fd/
   total 0
   lr-x------ 1 root root 64 Mar  2 15:53 0 -> /dev/null
   lr-x------ 1 root root 64 Mar  2 15:53 1 -> /dev/null
   lr-x------ 1 root root 64 Mar  2 15:53 2 -> /dev/null
   l-wx------ 1 root root 64 Mar  2 15:53 3 -> 
/var/log/radosgw/radosgw.log.1 (deleted)
   ...

Trying manually with   service radosgw reload  fails with
the same message. Running the non-upstart
/etc/init.d/radosgw reload   works. It will, kind of crudely,
just send a SIGHUP to any running radosgw process.

To figure out the cause I compared OSDs and RadosGW wrt
to upstart and got this:

   [B|root@node01]  /etc/init ➜  initctl list | grep osd
   ceph-osd-all start/running
   ceph-osd-all-starter stop/waiting
   ceph-osd (ceph/8) start/running, process 12473
   ceph-osd (ceph/9) start/running, process 12503
   ...

   [B|root@node01]  /etc/init ➜  initctl reload radosgw cluster="ceph" 
id="radosgw.node01"
   initctl: Unknown instance: ceph/radosgw.node01

   [B|root@node01]  /etc/init ➜  initctl list | grep rados
   radosgw-instance stop/waiting
   radosgw stop/waiting
   radosgw-all-starter stop/waiting
   radosgw-all start/running

Apart from me not being totally clear about what the difference
between radosgw-instance and radosgw is, obviously Upstart
has no idea about which PID to send the SIGHUP to when I ask
it to reload.

I can, of course, replace the logrotate config and use the
/etc/init.d/radosgw reload  approach, but I would like to
understand if this is something unique to our system, or if
this is a bug in the scripts.

FWIW here's an excerpt from /etc/ceph.conf:

   [client.radosgw.node01]
   host = node01
   rgw print continue = false
   keyring = /etc/ceph/keyring.radosgw.gateway
   rgw socket path = /tmp/radosgw.sock
   log file = /var/log/radosgw/radosgw.log
   rgw enable ops log = false
   rgw gc max objs = 31

Thanks!
Daniel

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com