Re: KeyValueStore: Some bug fixing in KeyValueStore prevent osd runtime crash #2875

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chendi,

It seemed that you find two bugs for KeyValueStore:

1. Potential race conflict with strip_header->buffers:

strip_header is owner by a thread who want to access header.  Now
KeyValueStore in order to avoid lock bottleneck, only use
Sequencer-level(much like PG)  to solve concurrent ops. But all write
ops will cause "meta" collection updating. It seemed that I make a
mistake for commit:126
line(https://github.com/yuyuyu101/ceph/commit/bb49547d0fa3f65aabb3b20a96fb8dabd8dc81c0).

Original, "meta" collection will disable cache avoiding concurrent
modify, I forget why I delete this line. So I think we still need to
add this line:

if (cid != coll_t()) {

2. OOM for KeyValueStore

strip_header->buffers is necessary for KeyValueStore which is used by
following ops in the same transaction. Because a write op may need
read the newest data which may affected by last write op in the same
transaction.

As for OOM, I think the root cause is the mistake commit above too.
Because "meta" collection will be updated each transaction and
StripObjectHeader::buffers will be always kept in memory because of
the strategy of cache. So this object's buffers will keep in
increasing all the time. So I think if we avoid cache "meta"
collection's object will just be fine. Although we don't observe OOM
for previous release except this mistake commit, I prefer to add codes
to discard "buffers" each submit transaction time to avoid potential
unpredicted memory growing.

Do you have a more clear impl about it? I'm just thinking a better way
to solve the performance bottleneck for "meta" collections.


On Fri, Nov 7, 2014 at 4:17 AM, Xue, Chendi <chendi.xue@xxxxxxxxx> wrote:
> Hi, all
>
> There is a small bug fixing in KeyValueStore to prevent osd crash
>
> When run random write 4k test on rbd with KeyValueStore backend, random osd crashes, and ceph-osd.log shows a segmentation fault which caused by multi threads updating the strip_header->buffers, so add mutex here
>
> Another problem is after running pretty long time, osd being killed by os due to OOM, which is also caused by there is no eviction mechanism in strip_header->buffers, so add a option in config_opts.h to turn strip_header->buffers off by default
>
> I saw slight performance drop when turn off the strip_header->buffers in a short time test( for I can't run a long time test due to osd will crash ), so does the strip_header->buffers necessary? If so, can I add a random cache mechanism on that?
>
> ===========================
> KeyValueStore:
> Add mutex when update strip_header->buffers to avoid segmentation fault
> Add option in config_opts.h to turn on/off strip_header->buffers to prevent OOM
>
> The pull request : https://github.com/ceph/ceph/pull/2875#issue-48042745
>
>
>
> Best Regards,
> -Chendi
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux