ceph osd debug question / proposal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Ceph gurus...

Just wanted to report something that may be interesting to enhance... or maybe I am not doing the right debugging procedure.

1. I am working with 0.92.2 and I am testing the cluster in several disaster catastrophe scenarios.

2. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per server.

3. I have deliberately unmounted  the filesystem of osd.4 but the daemon was left on. I just wanted to understand how the system would react. This was what happened:
a. While there was no I/0, the system did not realized that the osd-4 filesystem was not mounted, and the 'ceph -s' continues to report HEALTH_OK for the system status.

b. When I've started to impose some heavy I/O, the system started to complain of slow requests. Curiously, osd.4 never appears in the logs.
# ceph -s
    cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
     health HEALTH_WARN
            170 requests are blocked > 32 sec
     monmap e1: 3 mons at {rccephmon1=192.231.127.8:6789/0,rccephmon2=192.231.127.34:6789/0,rccephmon3=192.231.127.26:6789/0}
            election epoch 24, quorum 0,1,2 rccephmon1,rccephmon3,rccephmon2
     mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay
     osdmap e1179: 32 osds: 32 up, 32 in
      pgmap v907325: 2176 pgs, 2 pools, 4928 GB data, 1843 kobjects
            14823 GB used, 74228 GB / 89051 GB avail
                2174 active+clean
                   2 active+clean+replay

# ceph -w
(...)         
2015-08-19 17:44:55.161731 osd.1 [WRN] 88 slow requests, 8 included below; oldest blocked for > 3156.325716 secs
2015-08-19 17:44:55.161940 osd.1 [WRN] slow request 1920.533342 seconds old, received at 2015-08-19 17:12:54.628258: osd_op(client.44544.1:2266980 1000000022a.00006aec [write 524288~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171) currently waiting for replay end
2015-08-19 17:44:55.161950 osd.1 [WRN] slow request 1920.511098 seconds old, received at 2015-08-19 17:12:54.650502: osd_op(client.44544.1:2266988 1000000022a.00006aec [write 1048576~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171) currently waiting for replay end
2015-08-19 17:44:55.161957 osd.1 [WRN] slow request 1920.510451 seconds old, received at 2015-08-19 17:12:54.651149: osd_op(client.44544.1:2266996 1000000022a.00006aec [write 1572864~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171) currently waiting for replay end
2015-08-19 17:44:55.161963 osd.1 [WRN] slow request 1920.488589 seconds old, received at 2015-08-19 17:12:54.673011: osd_op(client.44544.1:2267004 1000000022a.00006aec [write 2097152~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171) currently waiting for replay end
2015-08-19 17:44:55.161970 osd.1 [WRN] slow request 1920.482785 seconds old, received at 2015-08-19 17:12:54.678815: osd_op(client.44544.1:2267012 1000000022a.00006aec [write 2621440~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171) currently waiting for replay end
(...)                
# grep "slow requests" /tmp/osd_failed.txt  | awk '{print $3}' | sort | uniq
osd.1
osd.11
osd.17
osd.23
osd.24
osd.26
osd.27
osd.31
osd.7

c. None of the standard 'ceph osd' commands indicated that the problematic OSD was osd.4. Only looking to ceph-osd.4.log, we find write error messages:
015-08-19 16:52:17.552512 7f6f69973700  0 -- 10.100.1.167:6809/23763 >> 10.100.1.169:6800/28352 pipe(0x175ca000 sd=169 :6809 s=0 pgs=0 cs=0 l=0 c=0x1f038000).accept connect_seq 180 vs existing 179 state standby
2015-08-19 16:52:17.566701 7f6f89d2a700 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
2015-08-19 16:52:17.567230 7f6f89d2a700  0 filestore(/var/lib/ceph/osd/ceph-4) write couldn't open 5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such file or directory
2015-08-19 16:52:17.567332 7f6f89d2a700 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
2015-08-19 16:52:17.567365 7f6f89d2a700 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
2015-08-19 16:52:17.570206 7f6f8a72b700 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
2015-08-19 16:52:17.570233 7f6f8a72b700  0 filestore(/var/lib/ceph/osd/ceph-4) write couldn't open 5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such file or directory
2015-08-19 16:52:17.570260 7f6f8a72b700 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
2015-08-19 16:52:17.570603 7f6f8a72b700 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
2015-08-19 16:52:17.572468 7f6f89d2a700 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
2015-08-19 16:52:17.572497 7f6f89d2a700  0 filestore(/var/lib/ceph/osd/ceph-4) write couldn't open 5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such file or directory

d. A restart of osd.4, remounted the osd.4 filesystem but the slow connection remain there. I had to restart all the osds in the same server as osd.4. Once that was done, ceph realized there were degraded PGs, and started recovery I/O.

If I am doing my debugging correctly, I see some potential problems:

- ceph was not capable to understand there was a missing or problematic file system until the client tries to access it for I/O.
- Then, the source of the problem was not obvious from the warning messages ceph was throwing. In this particular situation, if I would be running hundreds of osds and if I wouldn't know in advance which of them was failing, it would be difficult to identify the problematic one.

Maybe there is a more efficient way to perform this kind of debug? If not, can't an additional check show that the daemon can not properly access the filesystem? For example, I can imagine situation where the filesystem may be in read-only, for example.

TIA
Goncalo

-- 
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux