Re: Ceph-fuse timeout?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Files are subdivided to folders by 1000 in each folder (/img/10/21/10213456.jpg etc.) to increase performance.

 

I just discovered, that it would make much more sense to read images in webserver using radosgw (S3 API?) – if there will be problem on ceph server, it shouldn’t affect connected client, because it is simple HTTP request (timeout, cache, …). I don’t have any experience with this approach, but it looks like what I need. But I don’t understand from documentation if “ceph filesystem” and “ceph object gateway” are just two ways of accessing data or if it is something completely different and I can’t read file stored with “filesystem” using “object gateway”

 

I will also try upgrading to dumpling. But I want to resolve clients dealing with failed ceph server first and after that, I will try to make my ceph cluster more stable.

 

From: Gregory Farnum [mailto:greg@xxxxxxxxxxx]
Sent: Wednesday, August 21, 2013 6:27 PM
To: Petr Soukup
Cc: Gregory Farnum; ceph-users@xxxxxxxx
Subject: Re: Ceph-fuse timeout?

 

On Wednesday, August 21, 2013, Petr Soukup wrote:

Thank you Greg. It makes sense, that ceph-fuse will work this way.
I am experiencing this problem for example when I run rsync on folder with lot of small files in ceph (~300GB in 10kB files). Ceph is during that very slow and everything crash like domino.

 

Ah. That's 31 million files in a single folder? Or is it subdivided into smaller folders?

 

With the default settings Ceph's not going to do well with more than ~100k files in a folder; we can improve that somewhat but probably not to 31 million.

You could try enabling directory fragmentation and that would improve things a great deal more, but it will also be less stable long term as it needs more QA.

 

 

Alternatively, if you're seeing issues on te OSDs then upgrading to dumpling will make it better -- it adds some IO throttling that's probably applicable in this context.

-Greg

 

 


I will look into other ways of reading files from ceph. Most of the traffic is from webserver loading images -  I could load these images with some script using some ceph library and implement simple timeout.


-----Original Message-----
From: Gregory Farnum [mailto:greg@xxxxxxxxxxx]
Sent: Wednesday, August 21, 2013 2:03 AM
To: Petr Soukup
Cc: ceph-users@xxxxxxxx
Subject: Re: Ceph-fuse timeout?

On Tue, Aug 20, 2013 at 4:56 PM, Petr Soukup <soukup@xxxxxxxxxx> wrote:
> I am using ceph filesystem through ceph-fuse to store product photos and most of the time it works great. But if there is some problem on ceph server, my connected clients start acting crazy. Load on all servers with mounted ceph jumps very high,  webserver and other services start to crash.
> I think, that if ceph server is irresponsive, system creates some queue and as a result everything starts to fail. Simple solution for this is this command:
> umount -fl /media/ceph && ceph-fuse /media/ceph After that everything
> gets to normal in a few minutes.
>
> Is it possible to set some timeout to ceph-fuse or something? It is much better if reading photo from ceph will cause error than if everything fails at once.
>
> I am using ceph 0.61.7 (today upgraded to 0.61.8), 2x OSD, 1x MDS and 4x mon on different servers with Centos 6.4.
> I am going to try 0.67, but  I think that my main problem is configuration of ceph-fuse mount. I also tried newer kernel with support for mount ceph, but the kernel itself wasn't very stable.

What kind of problems are you seeing on the ceph server?
In general what you're seeing is the result of ceph-fuse behaving
optimistically: it expects that, if there is a problem on the cluster, then the problem will be dealt with shortly. So while it has lots of internal timeouts, it doesn't issue an error to clients unless it gets one back from a server. Doing otherwise in a data-safe fashion would be more or less impossible. If you want to add timeouts I bet you could find a library that will intercept system calls and let you put a timeout around them.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com



--
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux