Some questions of radosgw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, list,

I managed to setup radosgw in testing environment to see if it's 
stable/mature enough
for production use these several days.  In the meanwhile, I tried to 
read the source code
of radosgw to understand how it actually manages the underlying storage.

The testing result shows the the write performance to a bucket is not 
good, as far as I
understood from the code, it's caused by there is only *one* bucket 
index object for a
single bucket, which is not nice in principle. And moreover, requests to 
the whole bucket
could be blocked if the corresponding bucket index object happens to be 
in recovering or
backfilling process.  This is not acceptable in production use. Although 
I saw Guang Yang
did some work (the prototype patches [1]) to try to resolve the problem 
with  the bucket
index sharding,  I'm not quite confident about if it could solve the 
problem from root,
since it looks like radosgw is trying to manage millions or billions 
objects in one bucket
with the index,  I'm a bit worried about it even the index sharding is 
supported.

Another problem I encounted is:  when I upgraded radosgw to latest 
version (Firefly),
radosgw-admin works well, read request works well too, but all write 
request fails. Note
that I didn't do any changes on the config files, it means there is some 
compactibilties
problems (client in new version fails to talk with ceph cluster in old 
version).  The error
looks like:

2014-07-31 10:13:10.045921 7fdb40ddd700  0 ERROR: can't read user 
header: ret=-95
2014-07-31 10:13:10.045930 7fdb40ddd700  0 ERROR: sync_user() failed, 
user=osier ret=-95
2014-07-31 17:00:56.075066 7fe514fe6780  0 ceph version 0.80.5 
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 19974
2014-07-31 17:00:56.197659 7fe514fe6780  0 framework: fastcgi
2014-07-31 17:00:56.197666 7fe514fe6780  0 starting handler: fastcgi
2014-07-31 17:00:56.198941 7fe4f8ff9700  0 ERROR: FCGX_Accept_r returned -9
2014-07-31 17:00:56.211176 7fe4f9ffb700  0 ERROR: can't read user 
header: ret=-95
2014-07-31 17:00:56.211197 7fe4f9ffb700  0 ERROR: sync_user() failed, 
user=Bob Dylon ret=-95
2014-07-31 17:00:56.212306 7fe4f9ffb700  0 ERROR: can't read user 
header: ret=-95
2014-07-31 17:00:56.212325 7fe4f9ffb700  0 ERROR: sync_user() failed, 
user=osier ret=-95

With these two experience, I was starting to think about if radosgw is 
stable/mature
enough yet.  It seems that dreamhost is the only one using radosgw for 
service, though
it seems there are use cases in private environments from google.  I 
have no way to
demonstrate if it's stable and mature enough for production use except 
trying to understand
how it works, however,  I guess everybody knows it will be too hard to 
go back if a distributed
system is already in production use.  So I'm asking here to see if I 
could get some advices/
thoughts/suggestions from who already managed to setup radosgw for 
production use.

In case of the mail is long/boring enough, I'm submarizing my questions 
here:

1) Is radosgw stable/mature enough for production use?

2) How it behaves in performance (especially on writing) in practice?

3) Any potential problems could be caused by addressing the millions or 
billions objects with
index objects (even sharding is supported).

4) As far as I understood, it's better to not enable cache with multiple 
radosgw deployment,
but is there any other ways to work around?

5) Is there any other potential traps?

Much appreciated in advance.

[1] http://news.gmane.org/gmane.comp.file-systems.ceph.devel

Regards,
Osier


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux