Re: Crash and strange things on MDS

Kevin Decherf <kevin@xxxxxxxxxxxx> · Tue, 26 Feb 2013 20:58:37 +0100

On Tue, Feb 26, 2013 at 10:10:06AM -0800, Gregory Farnum wrote:
> On Tue, Feb 26, 2013 at 9:57 AM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote:
> > On Tue, Feb 19, 2013 at 05:09:30PM -0800, Gregory Farnum wrote:
> >> On Tue, Feb 19, 2013 at 5:00 PM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote:
> >> > On Tue, Feb 19, 2013 at 10:15:48AM -0800, Gregory Farnum wrote:
> >> >> Looks like you've got ~424k dentries pinned, and it's trying to keep
> >> >> 400k inodes in cache. So you're still a bit oversubscribed, yes. This
> >> >> might just be the issue where your clients are keeping a bunch of
> >> >> inodes cached for the VFS (http://tracker.ceph.com/issues/3289).
> >> >
> >> > Thanks for the analyze. We use only one ceph-fuse client at this time
> >> > which makes all "high-load" commands like rsync, tar and cp on a huge
> >> > amount of files. Well, I will replace it by the kernel client.
> >>
> >> Oh, that bug is just an explanation of what's happening; I believe it
> >> exists in the kernel client as well.
> >
> > After setting the mds cache size to 900k, storms are gone.
> > However we continue to observe high latency on some clients (always the
> > same clients): each IO takes between 40 and 90ms (for example with
> > Wordpress, it takes ~20 seconds to load all needed files...).
> > With a non-laggy client, IO requests take less than 1ms.
> 
> I can't be sure from that description, but it sounds like you've got
> one client which is generally holding the RW "caps" on the files, and
> then another client which comes in occasionally to read those same
> files. That requires the first client to drop its caps, and involves a
> couple round-trip messages and is going to take some time — this is an
> unavoidable consequence if you have clients sharing files, although
> there's probably still room for us to optimize.
> 
> Can you describe your client workload in a bit more detail?

We have one folder per application (php, java, ruby). Every application has
small (<1M) files. The folder is mounted by only one client by default.

In case of overload, another clients spawn to mount the same folder and
access the same files.

In the following test, only one client was used to serve the
application (a website using wordpress).

I made the test with strace to see the time of each IO request (strace -T
-e trace=file) and I noticed the same pattern:

...
[pid  4378] stat("/data/wp-includes/user.php", {st_mode=S_IFREG|0750, st_size=28622, ...}) = 0 <0.033409>
[pid  4378] lstat("/data/wp-includes/user.php", {st_mode=S_IFREG|0750, st_size=28622, ...}) = 0 <0.081642>
[pid  4378] open("/data/wp-includes/user.php", O_RDONLY) = 5 <0.041138>
[pid  4378] stat("/data/wp-includes/meta.php", {st_mode=S_IFREG|0750, st_size=10896, ...}) = 0 <0.082303>
[pid  4378] lstat("/data/wp-includes/meta.php", {st_mode=S_IFREG|0750, st_size=10896, ...}) = 0 <0.004090>
[pid  4378] open("/data/wp-includes/meta.php", O_RDONLY) = 5 <0.081929>
...

~250 files were accessed for only one request (thanks Wordpress.).

The fs is mounted with these options: rw,noatime,name=<hidden>,secret=<hidden>,nodcache.

I have a debug (debug_mds=20) log of the active mds during this test if you want.
-- 
Kevin Decherf - @Kdecherf
GPG C610 FE73 E706 F968 612B E4B2 108A BD75 A81E 6E2F
http://kdecherf.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html