Re: goofy results for df

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Re-adding the list because I failed to last time.]

Interesting! I don't think I've seen local nodes get their usage wrong
like that before, but there are a lot of storage systems I don't have
much experience with. The aggregate usage stats across a Ceph cluster
are derived from the local output of each OSD, so if that output is
wrong we can't do much with it.
You probably want to validate that the OSD disks really do have the
amount of data on them that we expect to see, just to make sure that's
all working right. Then you'd need to go on a hunt through your system
configuration to try and identify which piece of software is wrong
about the data in use.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Wed, Feb 26, 2014 at 12:30 AM, Markus Goldberg
<goldberg@xxxxxxxxxxxxxxxxx> wrote:
> Hi Gregory,
> yes, you are right. The df on ceph nodes are also wrong:
>
> root@bd-0:~# df -h
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda1        39G  5,3G   32G  15% /
> none            4,0K     0  4,0K   0% /sys/fs/cgroup
> udev             16G   12K   16G   1% /dev
> tmpfs            16G     0   16G   0% /tmp
> tmpfs           3,2G  1,3M  3,2G   1% /run
> none            5,0M  4,0K  5,0M   1% /run/lock
> none             16G     0   16G   0% /run/shm
> none            100M     0  100M   0% /run/user
> tmpfs            16G     0   16G   0% /var/tmp
> /dev/sdb1        20T   31G   20T   1% /var/lib/ceph/osd/ceph-0
>                        ^^^ wrong   ^^

<snip>

> Am 25.02.2014 18:55, schrieb Gregory Farnum:
>
> [Re-adding the list.]
>
> Yeah, that pg dump indicates that each OSD believes it is storing
> about 30GB (which could include a lot of stuff besides the raw RADOS
> usage) and I assume that you have 3x replication turned on? How large
> are those OSDs? Can you check their disk utilization locally on the
> OSD mounts and make certain it matches up there? Something strange is
> definitely happening.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Feb 25, 2014 at 7:57 AM, Markus Goldberg
> <goldberg@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Gregory,
> thank you very much for spending so much time for my problem.
> Attached is the output of the command 'ceph pg dump'
> I hope that i understood you correctly cause english is not my native
> language.
> The files are really so big. They are the virtual tapes of my bacula-backup.
>
> Markus
> Am 25.02.2014 16:20, schrieb Gregory Farnum:
>
> On Mon, Feb 24, 2014 at 11:48 PM, Markus Goldberg
> <goldberg@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Gregory,
> here we go:
>
> root@bd-a:/mnt/myceph#
> root@bd-a:/mnt/myceph# ls -la
> insgesamt 4
> drwxr-xr-x 1 root root 25928099891213 Feb 24 14:14 .
> drwxr-xr-x 4 root root           4096 Aug 30 10:34 ..
> drwx------ 1 root root 25920394954765 Feb  7 10:07 Backup
> drwxr-xr-x 1 root root    32826961870 Feb 24 14:51 temp
>
> I think, the big numbers above are the used bytes consumed in the
> directory
>
> Yep, those are the "recursive statistics" on directory size, and it
> agrees with your du.
>
> root@bd-a:/mnt/myceph#
> root@bd-a:/mnt/myceph# ceph osd dump
> epoch 146
> fsid ad1a4f5c-cc86-4fef-b8f6-xxxxxxxxxxxx
> created 2014-02-03 10:13:55.109549
> modified 2014-02-17 10:37:41.750786
> flags
>
> pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
> crash_replay_interval 45
> pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0
> object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
> pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
> pool 3 'markus' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 15 owner 0 flags hashpspool
> pool 4 'ecki' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 17 owner 0 flags hashpspool
> pool 5 'kevin' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 19 owner 0 flags hashpspool
> pool 6 'alfresco' replicated size 3 min_size 2 crush_ruleset 0
> object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 21 owner 0 flags hashpspool
> pool 7 'bacula' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 23 owner 0 flags hashpspool
> pool 8 'bareos' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 25 owner 0 flags hashpspool
> pool 9 'bs3' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 27 owner 0 flags hashpspool
> pool 10 'Verw-vdc2' replicated size 3 min_size 2 crush_ruleset 0
> object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 54 owner 0 flags hashpspool
>
> max_osd 3
> osd.0 up   in  weight 1 up_from 139 up_thru 143 down_at 138
> last_clean_interval [134,135) xxx.xxx.xxx.xx0:6801/2105
> 192.168.1.20:6800/2105 192.168.1.20:6801/2105 xxx.xxx.xxx.xx0:6802/2105
> exists,up b2b1a1bd-f6ba-47f2-8485-xxxxxxxxxx7e
> osd.1 up   in  weight 1 up_from 143 up_thru 143 down_at 142
> last_clean_interval [120,135) xxx.xxx.xxx.xx1:6801/2129
> 192.168.1.21:6800/2129 192.168.1.21:6801/2129 xxx.xxx.xxx.xx1:6802/2129
> exists,up 2dc1dd2c-ce99-4e7d-9672-xxx.xxx.xxx.xx1f
> osd.2 up   in  weight 1 up_from 139 up_thru 143 down_at 138
> last_clean_interval [125,135) xxx.xxx.xxx.xx2:6801/2018
> 192.168.1.22:6800/2018 192.168.1.22:6801/2018 xxx.xxx.xxx.xx2:6802/2018
> exists,up 83d293a1-5f34-4086-a3d6-xxx.xxx.xxx.xx7c
>
>
> root@bd-a:/mnt/myceph#
> root@bd-a:/mnt/myceph# ceph -s
>      cluster ad1a4f5c-cc86-4fef-b8f6-xxxxxxxxxxxx
>       health HEALTH_OK
>       monmap e1: 3 mons at
>
> {bd-0=xxx.xxx.xxx.xx0:6789/0,bd-1=xxx.xxx.xxx.xx1:6789/0,bd-2=xxx.xxx.xxx.xx2:6789/0},
> election epoch 506, quorum 0,1,2 bd-0,bd-1,bd-2
>       mdsmap e171: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
>       osdmap e146: 3 osds: 3 up, 3 in
>        pgmap v81525: 992 pgs, 11 pools, 31456 MB data, 8058 objects
>              94792 MB used, 61309 GB / 61408 GB avail
>                   992 active+clean
>
> But this indicates that raw RADOS indeed believes that it only has
> ~30GB of data total, which isn't enough to store 21TB of filesystem
> data! Are the available sizes correct? Can you dump the pgmap and
> paste bin it somewhere we can look at (sorry, I meant that rather than
> the OSDMap to begin with; my bad!)? I'm wondering if the stats are
> corrupted or what.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> --
> MfG,
>   Markus Goldberg
>
> --------------------------------------------------------------------------
> Markus Goldberg       Universität Hildesheim
>                       Rechenzentrum
> Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany
> Fax +49 5121 88392823 email goldberg@xxxxxxxxxxxxxxxxx
> --------------------------------------------------------------------------
>
>
>
>
> --
> MfG,
>   Markus Goldberg
>
> --------------------------------------------------------------------------
> Markus Goldberg       Universität Hildesheim
>                       Rechenzentrum
> Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany
> Fax +49 5121 88392823 email goldberg@xxxxxxxxxxxxxxxxx
> --------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux