Re: Deadline of Github pull request for Hammer release (question)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 22/01/2015 17:34, Miyamae, Takeshi wrote:
> Hi Loic,
> 
> Thank you for the code reference. We will use it to improve our codes.
> 
>> However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ?
> 
> Sure. The purpose of using threads is just to find lace bugs and ensure thread safety.

I'm not familiar with the expression "lace bug" and all I could find is http://en.wiktionary.org/wiki/lace which does not help me (I'm french, english is not my native language). Could you give me the definition of a "lace bug" ? 

> However, I suppose those tests may be a little adhoc and be far from completeness.
> Actually, we couldn't detect some init lace bugs including #7914 (We will fix those bugs at the next release).
> Do you have any better ideas to combat these kind of bugs systematically?
> 

Race conditions such as http://tracker.ceph.com/issues/7914 are quite difficult detect with tests and I don't know of a sure method to do so. Carefully proofreading the code and keeping the logic simple is probably the most effective way ;-) Once a race condition has been fixed, however, it should be possible to recreate the conditions for it to appear in a functional test that can be run with make check.

Note that there are two kind of tests in Ceph : the unit / functional tests that are run with make check, locally on the developer machine or by a bot on each pull request. These tests run quickly and do not include any benchmarking or stress tests, nor integration tests. The second kind is what you find in teuthology : these tests include stress tests, error injections etc. Most race conditions are discovered by running these teuthology tests and analyzing their output.

>> What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ?
> 
> I don't think current minimum_to_decode_2 is adequate, as you mentioned.
> The verification of return codes is required and we will compare it with expectations (EINVAL or EIO).
> However, as for the size of minimum_chunks, as it is hard to pre-estimate without knowing
> parity layout because of its random aspect, we won't improve it any more.

Do you mean that two consecutive runs of minimum_to_decode will provide different results ? 

> Then, as for minimum_to_decode_3, we found it meaningless to verify minimum_chunks because
> that is an error case when array size of want_to_decode and available_chunks are illegally large.
> We will verify only return code of minimum_to_decode and remove the current check of
> minimum_chunks in minimum_to_decode_3.

Ok. I'll take a closer look as soon as I can run them and try things.

Thanks for explaining !

> 
> Best regards,
> Takeshi Miyamae
> 
> -----Original Message-----
> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] 
> Sent: Thursday, January 22, 2015 6:28 PM
> To: Miyamae, Takeshi/宮前 剛
> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx)
> Subject: Re: Deadline of Github pull request for Hammer release (question)
> 
> Hi,
> 
> On 22/01/2015 09:42, Miyamae, Takeshi wrote:
>> Hi Loic,
>>
>> Thanks for your additional comments.
>>
>>> Would you be so kind as to update the 
>>> https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/M
>>> akefile.am
>>
>> Sure. We'll do that.
>>
>>> This is likely to fail on a slow machine. Instead of waiting you should use a lock.
>>
>> The purpose of sleep(1) is to ensure sequentiality between threads, 
>> but it's not enough on slow machines as you mentioned. So, we'd like 
>> another example that uses a mutex lock. What files should we refer to?
> 
> src/test/common/test_sharedptr_registry.cc contains an example. The classes that are helpful in implementing thread safe are in src/common/Cond.h and src/common/Mutex.h
> 
> However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ?
> 
>>
>>> They however all have the same assert ( EXPECT_TRUE(shec->matrix == 
>>> NULL); )
>>
>> We agree that EINVAL is better than EIO on verifying input variables. 
>> We'll review our whole sources and fix both our plugin and test codes.
> 
> Thanks :-) What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ?
> 
> Cheers
> 
>>
>> P.S. We are debugging thread safe mSHEC to release it tomorrow.
>>
>> Best regards,
>> Takeshi Miyamae
>>
>> -----Original Message-----
>> From: Loic Dachary [mailto:loic@xxxxxxxxxxx]
>> Sent: Wednesday, January 21, 2015 11:13 PM
>> To: Miyamae, Takeshi/宮前 剛
>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; 
>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx)
>> Subject: Re: Deadline of Github pull request for Hammer release 
>> (question)
>>
>> Hi,
>>
>> I see them at https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/TestErasureCodeShec.cc etc. thanks ! Would you be so kind as to update the https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/Makefile.am to add the compilation instructions for these files ?
>>
>> I see the tests includes a few threads (thread1 to thread5). It looks like they are used for measuring performance, am I right ? 
>>
>> 	pthread_t tid;
>> 	pthread_create(&tid,NULL,thread1,shec);
>> 	sleep(1);
>>
>> This is likely to fail on a slow machine. Instead of waiting you should use a lock. You will find examples in other tests, using various techniques depending on the context. If you'd like I could point to one. I'd have to better understand the intent of this test before I do.
>>
>> From init_5 to init_22 there are various combinations of parameters which suggest checking for all kinds of errors (m being negative, or a string instead of a number etc.). They however all have the same assert ( EXPECT_TRUE(shec->matrix == NULL); ). It would be better to have an expectation that verifies the error has been caught as expected. Is it part of what your completing at the moment ?
>>
>> minimum_to_decode_2 expectation for when it succeeds should be more 
>> precise that
>>
>> 	EXPECT_TRUE(minimum_chunks.size());
>>
>> since minimum_chunks is expecting to have a size that does not vary. And if it does that would be a problem.
>>
>> In minimum_to_decode_3 after
>>
>> 	EXPECT_NE(want_to_decode,minimum_chunks);
>>
>> you could check the expected content of minimum_chunks also. Unless there is a random aspect to it ? 
>>
>> Cheers
>>
>> On 20/01/2015 10:07, Miyamae, Takeshi wrote:
>>> Hi Loic,
>>>
>>> I'd appreciated your help very much.
>>> I have just uploaded 2 test files into the fork repository.
>>>
>>> https://github.com/t-miyamae/ceph
>>> files:
>>>   src/test/erasure-code/TestErasureCodeShec.cc
>>>   src/test/erasure-code/TestErasureCodeShec364.cc
>>>
>>> I'm sorry that tests for the thread safety codes has not been included yet.
>>>
>>> Thank you in advance,
>>> Takeshi Miyamae
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx 
>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary
>>> Sent: Tuesday, January 20, 2015 4:23 PM
>>> To: Miyamae, Takeshi/宮前 剛
>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔;
>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx)
>>> Subject: Re: Deadline of Github pull request for Hammer release
>>> (question)
>>>
>>> Hi,
>>>
>>> Thanks for the update. To speed up things, I could review what's already published. Did you add the tests already ?
>>>
>>> Cheers
>>>
>>> On 20/01/2015 01:48, Miyamae, Takeshi wrote:
>>>> Hi Loic,
>>>>
>>>> Thank you for your advice which suggested providing SHEC as an extra-package.
>>>> We believe it's generally a good idea, but we are hoping SHEC would 
>>>> be merged into Hammer because that would have an apparent advantage 
>>>> from a viewpoint of maintenance support.
>>>>
>>>> As for the thread safety issue, we totally agree with you and we are trying to solve it.
>>>> We will complete it in a few days and we'd like to ask you to review 
>>>> it again so that it might be in time for Hammer's feature freeze (v0.93).
>>>>
>>>> Best regards,
>>>> Takeshi Miyamae
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx 
>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary
>>>> Sent: Wednesday, January 14, 2015 3:03 AM
>>>> To: Miyamae, Takeshi/宮前 剛
>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 
>>>> 鷹詔;
>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx)
>>>> Subject: Re: Deadline of Github pull request for Hammer release
>>>> (question)
>>>>
>>>> Hi,
>>>>
>>>> On 13/01/2015 18:05, Miyamae, Takeshi wrote:
>>>>> Hi Loic,
>>>>>
>>>>> Thank you for your quick review.
>>>>>
>>>>>> Could you reference the jerasure files (they are in the jerasure 
>>>>>> plugin already) instead of including them in your patch ?
>>>>>
>>>>> We had used the reference of jerasure v2.0 files before, but 
>>>>> changed to including v1.2 files after the patent issue.
>>>>
>>>> If you are refering to http://erasure-code-patents.xyz/, it can safely be ignored.
>>>>
>>>>> However, we can restore the reference right away if we are needed.
>>>>>
>>>>>> In ::prepare you reuse the matrix, if possible. If your intention 
>>>>>> is to improve performances, you should consider using the same approach as the isa plugin.
>>>>>
>>>>> We have found a performance issue caused by redundant 
>>>>> initializations of the module, and solved it by caching the initialized variables in memory.
>>>>> If that is the same approach as isa-plugin you mentioned, we also 
>>>>> do not have the same issue any more.
>>>>
>>>> The ISA plugin approach is thread safe, that's the important part.
>>>>
>>>>>> I'll have more comments once you have unit / functional tests
>>>>>
>>>>> We do have the those unit/functional tests, but the server is unreachable at this moment.
>>>>> Please let us upload them to the repository tomorrow.
>>>>
>>>> Great.
>>>>
>>>>>> Regarding your initial question: I think it is too late for the Hammer release.
>>>>>
>>>>> We are so disappointed to hear that, but we must apologize for the late request.
>>>>> If there would be any chance that SHEC be pulled into hammer, we 
>>>>> are very happy to conduct all the necessary tests by ourselves.
>>>>> Please let us know the detailed schedule if this is a feasible option.
>>>>
>>>> Note that since this is a plugin it will be easy for someone to install it on a Hammer cluster, simply by copying the plugin files to each OSD / MON and restarting them. If there is some kind of urgency for you to be able to install this new plugin, I would advise making a package that contains just this file, restart the OSD once installed and recommend that the installation is done on all machines of the cluster (because every OSD / MON need to know about the plugin). 
>>>>
>>>> Cheers
>>>>
>>>>> Best regards,
>>>>> Takeshi Miyamae
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx 
>>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary
>>>>> Sent: Tuesday, January 13, 2015 9:46 PM
>>>>> To: Miyamae, Takeshi/宮前 剛
>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾
>>>>> 鷹詔;
>>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx)
>>>>> Subject: Re: Deadline of Github pull request for Hammer release
>>>>> (question)
>>>>>
>>>>> Hi,
>>>>>
>>>>> It's a great start :-) A few comments:
>>>>>
>>>>> Could you reference the jerasure files (they are in the jerasure plugin already) instead of including them in your patch ?
>>>>>
>>>>> In ::prepare you reuse the matrix, if possible. If your intention 
>>>>> is to improve performances, you should consider using the same 
>>>>> approach as the isa plugin. See 
>>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code/
>>>>> i
>>>>> s
>>>>> a
>>>>> /ErasureCodePluginIsa.cc#L36 which is and independent type 
>>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code/
>>>>> i s a /ErasureCodeIsaTableCache.h with only one instance and is 
>>>>> populated as needed and protected by a lock to make it thread safe :
>>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code/
>>>>> i
>>>>> s
>>>>> a
>>>>> /ErasureCodeIsaTableCache.h#L101
>>>>>
>>>>> I'll have more comments once you have unit / functional tests ( similar to http://workbench.dachary.org/ceph/ceph/blob/giant/src/test/erasure-code/TestErasureCodeJerasure.cc ). 
>>>>>
>>>>> Regarding your initial question: I think it is too late for the Hammer release. But from what I read it looks like we'll be able to merge in the early stages of the next release and that will give us time to properly test it.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On 13/01/2015 11:34, Miyamae, Takeshi wrote:
>>>>>> Hi Loic,
>>>>>>
>>>>>> I'm so sorry. The following is the correct repository.
>>>>>>
>>>>>> https://github.com/t-miyamae/ceph
>>>>>>
>>>>>> Best regards,
>>>>>> Takeshi Miyamae
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Loic Dachary [mailto:loic@xxxxxxxxxxx]
>>>>>> Sent: Tuesday, January 13, 2015 7:26 PM
>>>>>> To: Miyamae, Takeshi/宮前 剛
>>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾
>>>>>> 鷹詔;
>>>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx)
>>>>>> Subject: Re: Deadline of Github pull request for Hammer release
>>>>>> (question)
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 13/01/2015 11:24, Miyamae, Takeshi wrote:
>>>>>>> Hi Loic,
>>>>>>>
>>>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin.
>>>>>>>
>>>>>>> Thank you for your advices.
>>>>>>> We have uploaded our latest codes to the following folk repository for an early review.
>>>>>>>
>>>>>>> https://github.com/miyamae-takeshi/multiple-shec
>>>>>>
>>>>>> It's 404 ? Is it a private repository maybe ?
>>>>>>
>>>>>>>
>>>>>>> SHEC is located in src/erasure-code/shec directory.
>>>>>>>
>>>>>>> We are verifying SHEC's advantages on our Ceph cluster. It takes a little bit more.
>>>>>>> Would you please start the review before that?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Takeshi Miyamae
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx 
>>>>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic 
>>>>>>> Dachary
>>>>>>> Sent: Wednesday, January 7, 2015 12:52 AM
>>>>>>> To: Miyamae, Takeshi/宮前 剛
>>>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾
>>>>>>> 鷹詔
>>>>>>> Subject: Re: Deadline of Github pull request for Hammer release
>>>>>>> (question)
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 06/01/2015 12:49, Miyamae, Takeshi wrote:
>>>>>>>> Dear Loic,
>>>>>>>>
>>>>>>>> I'm Takeshi Miyamae, one of the authors of SHEC's blueprint.
>>>>>>>>
>>>>>>>> Shingled Erasure Code (SHEC)
>>>>>>>> https://wiki.ceph.com/Planning/Blueprints/Hammer/Shingled_Erasur
>>>>>>>> e
>>>>>>>> _
>>>>>>>> C
>>>>>>>> o
>>>>>>>> d
>>>>>>>> e
>>>>>>>> _(SHEC)
>>>>>>>
>>>>>>> The work you have done is quite impressive :-)
>>>>>>>
>>>>>>>> We have revised our blueprint shown in the last CDS to extend 
>>>>>>>> our erasure code layouts and describe the guideline for choosing SHEC among various EC plugins.
>>>>>>>> We believe the blueprint now answers all the comments given at the CDS.
>>>>>>>
>>>>>>> Great.
>>>>>>>
>>>>>>>> In addition, we would like to ask for your advice on the 
>>>>>>>> schedule of our github pull request. More specifically, we would 
>>>>>>>> like to know its deadline for Hammer release.
>>>>>>>> (As we have not really completed our verification of SHEC, we 
>>>>>>>> are wondering if we should make it open for early preview.)
>>>>>>>
>>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>> Thank you in advance,
>>>>>>>> Takeshi Miyamae
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>>>>>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More 
>>>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>
>>>>>
>>>>> --
>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay ʇڙ ,j   f   h   z  w   
>>    j:+v   w j m         zZ+     ݢj"  !tml=
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay ʇڙ ,j   f   h   z  w   
>    j:+v   w j m         zZ+     ݢj"  !tml=
>>
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux