Re: Odd "data used" reporting behavior by ceph -w

Gregory Farnum <gregf@xxxxxxxxxxxxxxx> · Fri, 3 Dec 2010 14:36:59 -0800

How are you generating these files? It sounds like maybe you're doing
them concurrently on a bunch of clients?

There are two separate issues here:
One is that your clients are maintaining caps on the files, which I
suspect is why the data use stays pretty high in step 3 -- when you
truncate a file the MDS is responsible for deleting the data off the
OSDs, which can take a while if you do a bunch of truncates at once or
if the client holds enough capabilities on the file that it doesn't
need to notify the MDS of the truncate right away. I suspect this is
also why the recreate is taking so long -- if you're recreating the
files on a different client the clients may be fighting over
capabilities or going into shared-write mode, which is significantly
slower. (This second behavior, depending on how you've set your test
up, may be a bug.)
The second issue is that you're seeing disk space used up even when
the filesystem is empty. First, it's possible that objects were still
being deleted off the OSDs, since you lost another 2.5GB of data
between your unmount and remount. Second, it's important to keep in
mind that the reported "space used" is based off of the usage
reporting of each individual disk in the cluster. Depending on how
your configuration is set up, that space used can include OSD journals
and debugging logs, and will include the MDS journal. :)
-Greg

On Fri, Dec 3, 2010 at 1:39 PM, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
> Hi,
>
> I'm seeing some odd behavior that suggests that space doesn't
> get released back to the storage pool unless a file is
> truncated - unlinking it doesn't seem do it.  This is based on
> the data use reported by ceph -w.
>
> Maybe I don't understand what I'm seeing below,
> or what is supposed to happen?
>
> Steps to reproduce:
>
> 1) create/start/mount a new file system; ceph -w reports:
>
> 2010-12-03 13:14:34.027320    pg v52: 3432 pgs: 3432 active+clean; 162 KB data, 54252 KB used, 2986 GB / 3013 GB avail
> (lots of scrub activity)
> 2010-12-03 13:19:38.379998    pg v164: 3432 pgs: 3432 active+clean; 162 KB data, 41728 KB used, 2986 GB / 3013 GB avail
>
> 2) create 64 files with a total of 4096 MiB data; ceph -w reports:
>
> 2010-12-03 13:21:42.645536    pg v262: 3432 pgs: 3432 active+clean; 4096 MB data, 6383 MB used, 2980 GB / 3013 GB avail
> 2010-12-03 13:21:47.615336    pg v263: 3432 pgs: 3432 active+clean; 4096 MB data, 6401 MB used, 2980 GB / 3013 GB avail
> (lots of scrub activity)
> 2010-12-03 13:27:41.818799    pg v499: 3432 pgs: 3432 active+clean; 4096 MB data, 8259 MB used, 2978 GB / 3013 GB avail
>
> 3) truncate above files to zero length
> (e.g. for f in $flist; do echo -n "" > /mnt/ceph/$f; done)
> ceph -w reports:
>
> 2010-12-03 13:28:57.909734    pg v552: 3432 pgs: 3432 active+clean; 1018 KB data, 8280 MB used, 2978 GB / 3013 GB avail
> (lots of scrub activity)
> 2010-12-03 13:34:05.856985    pg v602: 3432 pgs: 3432 active+clean; 1018 KB data, 3167 MB used, 2983 GB / 3013 GB avail
>
> Maybe I should have waited longer, for more scrubbing,
> to see the used space drop further?
>
> 4) recreate files, same size/name as step 2);
>
> Note that this step takes _much_ longer: 1448 sec vs. 41 sec.
> Maybe redirecting stdout onto a file from an echo of nothing
> is a really stupid way to truncate a file, but still...
> seems like something might not be right?
>
> At the end, ceph -w reports:
>
> 2010-12-03 13:59:18.031146    pg v3902: 3432 pgs: 3432 active+clean; 4097 MB data, 8574 MB used, 2978 GB / 3013 GB avail
> (lots of scrub activity)
> 2010-12-03 14:05:33.016532    pg v3971: 3432 pgs: 3432 active+clean; 4097 MB data, 8595 MB used, 2978 GB / 3013 GB avail
>
> 5) rm all files; ceph -w reports:
>
> 2010-12-03 14:06:08.287086    pg v3993: 3432 pgs: 3432 active+clean; 4033 MB data, 8596 MB used, 2978 GB / 3013 GB avail
> (lots of scrub activity)
> 2010-12-03 14:12:29.090263    pg v4139: 3432 pgs: 3432 active+clean; 4033 MB data, 8520 MB used, 2978 GB / 3013 GB avail
>
> Should the space reported as used here get returned
> to the available pool eventually?  Should I have
> waited longer?
>
> 6) unmount file system on all clients; ceph -w reports:
>
> (lots of scrub activity)
> 2010-12-03 14:15:04.119015    pg v4232: 3432 pgs: 3432 active+clean; 1730 KB data, 4213 MB used, 2982 GB / 3013 GB avail
>
>
> 7) remount file system on all clients; ceph -w reports:
>
> 2010-12-03 14:16:28.238805    pg v4271: 3432 pgs: 3432 active+clean; 1754 KB data, 1693 MB used, 2985 GB / 3013 GB avail
>
>
> Hopefully the above is useful.  They were
> generated on unstable (63fab458f625) + rc (378d13df9505)
> + testing (1a4ad835de66) branches.
>
> -- Jim
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html