[ correct the URL ] On 2014?08?02? 00:42, Osier Yang wrote: > Hi, list, > > I managed to setup radosgw in testing environment to see if it's > stable/mature enough > for production use these several days. In the meanwhile, I tried to > read the source code > of radosgw to understand how it actually manages the underlying storage. > > The testing result shows the the write performance to a bucket is not > good, as far as I > understood from the code, it's caused by there is only *one* bucket > index object for a > single bucket, which is not nice in principle. And moreover, requests > to the whole bucket > could be blocked if the corresponding bucket index object happens to > be in recovering or > backfilling process. This is not acceptable in production use. > Although I saw Guang Yang > did some work (the prototype patches [1]) to try to resolve the > problem with the bucket > index sharding, I'm not quite confident about if it could solve the > problem from root, > since it looks like radosgw is trying to manage millions or billions > objects in one bucket > with the index, I'm a bit worried about it even the index sharding is > supported. > > Another problem I encounted is: when I upgraded radosgw to latest > version (Firefly), > radosgw-admin works well, read request works well too, but all write > request fails. Note > that I didn't do any changes on the config files, it means there is > some compactibilties > problems (client in new version fails to talk with ceph cluster in old > version). The error > looks like: > > 2014-07-31 10:13:10.045921 7fdb40ddd700 0 ERROR: can't read user > header: ret=-95 > 2014-07-31 10:13:10.045930 7fdb40ddd700 0 ERROR: sync_user() failed, > user=osier ret=-95 > 2014-07-31 17:00:56.075066 7fe514fe6780 0 ceph version 0.80.5 > (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 19974 > 2014-07-31 17:00:56.197659 7fe514fe6780 0 framework: fastcgi > 2014-07-31 17:00:56.197666 7fe514fe6780 0 starting handler: fastcgi > 2014-07-31 17:00:56.198941 7fe4f8ff9700 0 ERROR: FCGX_Accept_r > returned -9 > 2014-07-31 17:00:56.211176 7fe4f9ffb700 0 ERROR: can't read user > header: ret=-95 > 2014-07-31 17:00:56.211197 7fe4f9ffb700 0 ERROR: sync_user() failed, > user=Bob Dylon ret=-95 > 2014-07-31 17:00:56.212306 7fe4f9ffb700 0 ERROR: can't read user > header: ret=-95 > 2014-07-31 17:00:56.212325 7fe4f9ffb700 0 ERROR: sync_user() failed, > user=osier ret=-95 > > With these two experience, I was starting to think about if radosgw is > stable/mature > enough yet. It seems that dreamhost is the only one using radosgw for > service, though > it seems there are use cases in private environments from google. I > have no way to > demonstrate if it's stable and mature enough for production use except > trying to understand > how it works, however, I guess everybody knows it will be too hard to > go back if a distributed > system is already in production use. So I'm asking here to see if I > could get some advices/ > thoughts/suggestions from who already managed to setup radosgw for > production use. > > In case of the mail is long/boring enough, I'm submarizing my > questions here: > > 1) Is radosgw stable/mature enough for production use? > > 2) How it behaves in performance (especially on writing) in practice? > > 3) Any potential problems could be caused by addressing the millions > or billions objects with > index objects (even sharding is supported). > > 4) As far as I understood, it's better to not enable cache with > multiple radosgw deployment, > but is there any other ways to work around? > > 5) Is there any other potential traps? > > Much appreciated in advance. > > [1] http://news.gmane.org/gmane.comp.file-systems.ceph.devel Never mind, it's http://article.gmane.org/gmane.comp.file-systems.ceph.devel/20428 Regards, Osier