Re: Disk allocation

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Mon, 21 Mar 2011 15:47:32 -0700



Well at this point your disk usage probably isn't dropping because 2 of your 4 OSDs have crashed somehow (this is probably also why you can't mount the fs -- you've lost access to too much metadata). If you have core files or debug logs from those crashed OSDs we'd like to see the backtrace to try and debug whatever happened. :)
On Monday, March 21, 2011 at 2:24 PM, Martin Wilderoth wrote: 
> One was removed the other one is still there. When ls the snapshot it stoped working. Now I get can't read superblock while trying to mount the ceph file system. I have restarted all servers.
> 
> But it looked like one snapshot was not correctly removed.
> 
> ceph helth is reporting
> 2011-03-21 22:13:53.581270 7fa2db738720 -- :/1813 messenger.start
> 2011-03-21 22:13:53.582765 7fa2db738720 -- :/1813 --> mon0 10.0.6.10:6789/0 -- auth(proto 0 30 bytes) v1 -- ?+0 0x11b04c0
> 2011-03-21 22:13:53.583276 7fa2db737700 -- 10.0.6.11:0/1813 learned my addr 10.0.6.11:0/1813
> 2011-03-21 22:13:53.586034 7fa2d90c1700 -- 10.0.6.11:0/1813 <== mon0 10.0.6.10:6789/0 1 ==== auth_reply(proto 1 0 Success) v1 ==== 24+0+0 (3548204067 0 0) 0x11b04c0 con 0x11b2280
> 2011-03-21 22:13:53.586077 7fa2d90c1700 -- 10.0.6.11:0/1813 --> mon0 10.0.6.10:6789/0 -- mon_subscribe({monmap=0+}) v1 -- ?+0 0x11b25d0
> 2011-03-21 22:13:53.586490 7fa2d90c1700 -- 10.0.6.11:0/1813 <== mon0 10.0.6.10:6789/0 2 ==== mon_map v1 ==== 187+0+0 (4038329719 0 0) 0x11b04c0 con 0x11b2280
> 2011-03-21 22:13:53.586563 7fa2d90c1700 -- 10.0.6.11:0/1813 <== mon0 10.0.6.10:6789/0 3 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3131629013 0 0) 0x11b25d0 con 0x11b2280
> 2011-03-21 22:13:53.586558 mon <- [health]
> 2011-03-21 22:13:53.586626 7fa2db738720 -- 10.0.6.11:0/1813 --> mon0 10.0.6.10:6789/0 -- mon_command(health v 0) v1 -- ?+0 0x11b04c0
> 2011-03-21 22:13:53.587216 7fa2d90c1700 -- 10.0.6.11:0/1813 <== mon0 10.0.6.10:6789/0 4 ==== mon_command_ack([health]=0 HEALTH_WARN osdmonitor: num_osds = 4, num_up_osds = 2, num_in_osds = 4 Some PGs are: crashed,down,degraded,peering v1) v1 ==== 154+0+0 (2262019121 0 0) 0x11b04c0 con 0x11b2280
> 2011-03-21 22:13:53.587244 mon0 -> 'HEALTH_WARN osdmonitor: num_osds = 4, num_up_osds = 2, num_in_osds = 4 Some PGs are: crashed,down,degraded,peering' (0)
> 2011-03-21 22:13:53.587421 7fa2db738720 -- 10.0.6.11:0/1813 shutdown complete.
> 
> the ods3 is not reducing any more data 24 G is still left. Not sure what logs you would like to see ?.
> 
> I could try to create the problem again.
> I have been creating big files using dd if=/dev/zero of=test.iso bs=1024k count=10k ( 10GB ). This has created heavy load on the osd daemons in my system.
> I have also coped some other bis iso images. I have removed and added files like this.
> 
> The snapshot was just some textfiles to play with the snaphost functionality.
> 
> I have been using ceph 0.25 and 0.25.1 on a debian 6.0 system. The filesystem is mounted on an opensuse server 11.3, Linux linxen1 2.6.34.7-0.7-xen.
> 
> -Martin
> 
> Unfortunately we haven't developed our fsck tools yet, although they are coming. However, we'd like to work out what happened to break your cluster so that we can fix it! 
> Do you have any remaining logs from when your OSDs crashed? Have you confirmed that the snapshots are gone? Are the OSDs continuing to reduce their data used numbers? 
> -Greg 
> On Monday, March 21, 2011 at 12:51 PM, Martin Wilderoth wrote: 
> > The disks are on seperate partition and I'm using the btrfs file system. 
> > They are mounted under /data/osd0 osd1..... 
> > 
> > I remove the snapshots and the the system was reporting HEALTH WARNING. 
> > two of the osd went down 
> > 
> > ceph ods stat reports: 
> > 2011-03-21 19:14:00.122945 7f8c1d83e720 -- :/26712 messenger.start 
> > 2011-03-21 19:14:00.123344 7f8c1d83e720 -- :/26712 --> mon0 10.0.6.10:6789/0 -- auth(proto 0 30 bytes) v1 -- ?+0 0x242d4c0 
> > 2011-03-21 19:14:00.123701 7f8c1d83d700 -- 10.0.6.10:0/26712 learned my addr 10.0.6.10:0/26712 
> > 2011-03-21 19:14:00.124305 7f8c1b1c7700 -- 10.0.6.10:0/26712 <== mon0 10.0.6.10:6789/0 1 ==== auth_reply(proto 1 0 Success) v1 ==== 24+0+0 (709083268 0 0) 0x242d4c0 con 0x242f280 
> > 2011-03-21 19:14:00.124349 7f8c1b1c7700 -- 10.0.6.10:0/26712 --> mon0 10.0.6.10:6789/0 -- mon_subscribe({monmap=0+}) v1 -- ?+0 0x242f5d0 
> > 2011-03-21 19:14:00.124667 7f8c1b1c7700 -- 10.0.6.10:0/26712 <== mon0 10.0.6.10:6789/0 2 ==== mon_map v1 ==== 187+0+0 (4038329719 0 0) 0x242d4c0 con 0x242f280 
> > 2011-03-21 19:14:00.124746 7f8c1b1c7700 -- 10.0.6.10:0/26712 <== mon0 10.0.6.10:6789/0 3 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3131629013 0 0) 0x242f5d0 con 0x242f280 
> > 2011-03-21 19:14:00.124744 mon <- [osd,stat] 
> > 2011-03-21 19:14:00.124824 7f8c1d83e720 -- 10.0.6.10:0/26712 --> mon0 10.0.6.10:6789/0 -- mon_command(osd stat v 0) v1 -- ?+0 0x242d4c0 
> > 2011-03-21 19:14:00.125131 7f8c1b1c7700 -- 10.0.6.10:0/26712 <== mon0 10.0.6.10:6789/0 4 ==== mon_command_ack([osd,stat]=0 e426: 4 osds: 2 up, 2 in v426) v1 ==== 69+0+0 (3071290324 0 0) 0x242d4c0 con 0x242f280 
> > 2011-03-21 19:14:00.125155 mon0 -> 'e426: 4 osds: 2 up, 2 in' (0) 
> > 2011-03-21 19:14:00.125559 7f8c1d83e720 -- 10.0.6.10:0/26712 shutdown complete. 
> > 
> > I restarted the cluser and it seemd ok again. The data is accessable. 
> > Now ods2 has also cleared some data. 
> > 
> > osd0 1.1GB 
> > osd1 1.1GB 
> > osd2 1.2GB 
> > osd3 24GB 
> > 
> > But du is reporting 110MB on the mounted filesystem. 
> > 
> > Is there a way to recover as it seems as if something is corupt in my system. 
> > It also seems as some of my ods has difficulties to stay up, not sure what I have done wrong. 
> > Maybe the best is to restart with a new file system :-) 
> > 
> > ----- Ursprungligt meddelande ----- 
> > FrÃn: "Ben De Luca" <bdeluca@xxxxxxxxx> 
> > Till: "Gregory Farnum" <gregory.farnum@xxxxxxxxxxxxx> 
> > Kopia: "Martin Wilderoth" <martin.wilderoth@xxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx 
> > Skickat: mÃndag, 21 mar 2011 18:32:46 
> > Ãmne: Re: Disk allocation 
> > 
> > Sorry to jump into the converstation, how slow can the deletion of 
> > files actually be? 
> > 
> > One of the tests I ran a few weeks ago had me generating files, 
> > deleting them and then writing them again from a number of clients. I 
> > noticed that the space would never freed up again. I have my OSD's and 
> > their journals on dedicated partions. 
> > 
> > I had planned on asking more on this once I had a stable system again. 
> > 
> > 
> > 
> > On Mon, Mar 21, 2011 at 3:17 PM, Gregory Farnum 
> > <gregory.farnum@xxxxxxxxxxxxx> wrote: 
> > > On Sat, Mar 19, 2011 at 11:43 PM, Martin Wilderoth 
> > > <martin.wilderoth@xxxxxxxxxx> wrote: 
> > > > I have a small ceph cluster with 4 osd ( 2 disks on 2 hosts). 
> > > > 
> > > > I have been adding and removing files from the file system, mounted as ceph on an other host. 
> > > > 
> > > > Now I have removed most of the data on the file system, so I only have 300 MB left plus two snapshots. 
> > > > 
> > > > The problem is that looking at the disks the are allocating 88G of data 
> > > > on the ceph filesystem. 
> > > There are a few possibilities: 
> > > 1) You've hosted your OSDs on a partition that's shared with the rest 
> > > of the computer. In that case the reported used space will include 
> > > whatever else is on the partition, not just the Ceph files. (This can 
> > > include Ceph debug logs, so even if nothing used to be there but you 
> > > were logging on that partition that can build up pretty quickly.) 
> > > 2) You deleted the files quickly and just haven't given enough time 
> > > for the file deletion to propagate to the OSDs. Because the POSIX 
> > > filesystem is layered over an object store, this can take some time. 
> > > 3) Your snapshots contain a lot of files, so nothing (or very little) 
> > > actually got deleted. Snapshots are pretty cool but they aren't 
> > > miraculous disk space! 
> > > Given the uneven distribution of disk space I suspect option #2, but I 
> > > could be mistaken. :) Let us know! 
> > > -Greg 
> > > -- 
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx 
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majordomo@xxxxxxxxxxxxxxx 
> > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html