radosgw hung when OS disks went readonly, different node radosgw restart fixed it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


Just had an incident in a 3-node test cluster running 12.1.1 on debian stretch

Each cluster had its own mon, mgr, radosgw, and osds.  Just object store.

I had s3cmd looping and uploading files via S3.

On one of the machines, the RAID controller barfed and dropped the OS disks.  Or the disks failed.  TBC.  Anyway, / and /var went readonly.

The monitor on that machine found it couldn't write its logs and died.  But the OSDs stayed up - those disks didn't go readonly.


health: HEALTH_WARN
        1/3 mons down, quorum store01,store03
osd: 18 osds: 18 up, 18 in
rgw: 3daemons active


The S3 process started timing out on connections to radosgw.  Even when talking to one of the other two radosgw instances.  (I'm RRing the DNS records at the moment).

I stopped the OSDs on that box.  No change.  I stopped radosgw on that box.  Still no change.  The S3 upload process was still hanging/timing out.  A manual telnet to port 80 on the good nodes still hung.

"radosgw-admin bucket list" showed buckets &c

Then I restarted radosgw on one of the other two nodes.  After about a minute, the looping S3 upload process started working again.


So my questions:  Why did I have to manually restart radosgw on one of the other nodes?  Why didn't it either keep working, or e.g. start working when radosgw was stopped on the bad node?

Also where are the radosgw server/access logs?


I know it's probably an unusual edge case or something, but we're aiming for HA and redundancy.


Thanks!

Sean Purdy
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux