Hi, On 22/01/2015 17:34, Miyamae, Takeshi wrote: > Hi Loic, > > Thank you for the code reference. We will use it to improve our codes. > >> However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ? > > Sure. The purpose of using threads is just to find lace bugs and ensure thread safety. I'm not familiar with the expression "lace bug" and all I could find is http://en.wiktionary.org/wiki/lace which does not help me (I'm french, english is not my native language). Could you give me the definition of a "lace bug" ? > However, I suppose those tests may be a little adhoc and be far from completeness. > Actually, we couldn't detect some init lace bugs including #7914 (We will fix those bugs at the next release). > Do you have any better ideas to combat these kind of bugs systematically? > Race conditions such as http://tracker.ceph.com/issues/7914 are quite difficult detect with tests and I don't know of a sure method to do so. Carefully proofreading the code and keeping the logic simple is probably the most effective way ;-) Once a race condition has been fixed, however, it should be possible to recreate the conditions for it to appear in a functional test that can be run with make check. Note that there are two kind of tests in Ceph : the unit / functional tests that are run with make check, locally on the developer machine or by a bot on each pull request. These tests run quickly and do not include any benchmarking or stress tests, nor integration tests. The second kind is what you find in teuthology : these tests include stress tests, error injections etc. Most race conditions are discovered by running these teuthology tests and analyzing their output. >> What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ? > > I don't think current minimum_to_decode_2 is adequate, as you mentioned. > The verification of return codes is required and we will compare it with expectations (EINVAL or EIO). > However, as for the size of minimum_chunks, as it is hard to pre-estimate without knowing > parity layout because of its random aspect, we won't improve it any more. Do you mean that two consecutive runs of minimum_to_decode will provide different results ? > Then, as for minimum_to_decode_3, we found it meaningless to verify minimum_chunks because > that is an error case when array size of want_to_decode and available_chunks are illegally large. > We will verify only return code of minimum_to_decode and remove the current check of > minimum_chunks in minimum_to_decode_3. Ok. I'll take a closer look as soon as I can run them and try things. Thanks for explaining ! > > Best regards, > Takeshi Miyamae > > -----Original Message----- > From: Loic Dachary [mailto:loic@xxxxxxxxxxx] > Sent: Thursday, January 22, 2015 6:28 PM > To: Miyamae, Takeshi/宮前 剛 > Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) > Subject: Re: Deadline of Github pull request for Hammer release (question) > > Hi, > > On 22/01/2015 09:42, Miyamae, Takeshi wrote: >> Hi Loic, >> >> Thanks for your additional comments. >> >>> Would you be so kind as to update the >>> https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/M >>> akefile.am >> >> Sure. We'll do that. >> >>> This is likely to fail on a slow machine. Instead of waiting you should use a lock. >> >> The purpose of sleep(1) is to ensure sequentiality between threads, >> but it's not enough on slow machines as you mentioned. So, we'd like >> another example that uses a mutex lock. What files should we refer to? > > src/test/common/test_sharedptr_registry.cc contains an example. The classes that are helpful in implementing thread safe are in src/common/Cond.h and src/common/Mutex.h > > However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ? > >> >>> They however all have the same assert ( EXPECT_TRUE(shec->matrix == >>> NULL); ) >> >> We agree that EINVAL is better than EIO on verifying input variables. >> We'll review our whole sources and fix both our plugin and test codes. > > Thanks :-) What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ? > > Cheers > >> >> P.S. We are debugging thread safe mSHEC to release it tomorrow. >> >> Best regards, >> Takeshi Miyamae >> >> -----Original Message----- >> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] >> Sent: Wednesday, January 21, 2015 11:13 PM >> To: Miyamae, Takeshi/宮前 剛 >> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; >> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >> Subject: Re: Deadline of Github pull request for Hammer release >> (question) >> >> Hi, >> >> I see them at https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/TestErasureCodeShec.cc etc. thanks ! Would you be so kind as to update the https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/Makefile.am to add the compilation instructions for these files ? >> >> I see the tests includes a few threads (thread1 to thread5). It looks like they are used for measuring performance, am I right ? >> >> pthread_t tid; >> pthread_create(&tid,NULL,thread1,shec); >> sleep(1); >> >> This is likely to fail on a slow machine. Instead of waiting you should use a lock. You will find examples in other tests, using various techniques depending on the context. If you'd like I could point to one. I'd have to better understand the intent of this test before I do. >> >> From init_5 to init_22 there are various combinations of parameters which suggest checking for all kinds of errors (m being negative, or a string instead of a number etc.). They however all have the same assert ( EXPECT_TRUE(shec->matrix == NULL); ). It would be better to have an expectation that verifies the error has been caught as expected. Is it part of what your completing at the moment ? >> >> minimum_to_decode_2 expectation for when it succeeds should be more >> precise that >> >> EXPECT_TRUE(minimum_chunks.size()); >> >> since minimum_chunks is expecting to have a size that does not vary. And if it does that would be a problem. >> >> In minimum_to_decode_3 after >> >> EXPECT_NE(want_to_decode,minimum_chunks); >> >> you could check the expected content of minimum_chunks also. Unless there is a random aspect to it ? >> >> Cheers >> >> On 20/01/2015 10:07, Miyamae, Takeshi wrote: >>> Hi Loic, >>> >>> I'd appreciated your help very much. >>> I have just uploaded 2 test files into the fork repository. >>> >>> https://github.com/t-miyamae/ceph >>> files: >>> src/test/erasure-code/TestErasureCodeShec.cc >>> src/test/erasure-code/TestErasureCodeShec364.cc >>> >>> I'm sorry that tests for the thread safety codes has not been included yet. >>> >>> Thank you in advance, >>> Takeshi Miyamae >>> >>> -----Original Message----- >>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >>> Sent: Tuesday, January 20, 2015 4:23 PM >>> To: Miyamae, Takeshi/宮前 剛 >>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; >>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>> Subject: Re: Deadline of Github pull request for Hammer release >>> (question) >>> >>> Hi, >>> >>> Thanks for the update. To speed up things, I could review what's already published. Did you add the tests already ? >>> >>> Cheers >>> >>> On 20/01/2015 01:48, Miyamae, Takeshi wrote: >>>> Hi Loic, >>>> >>>> Thank you for your advice which suggested providing SHEC as an extra-package. >>>> We believe it's generally a good idea, but we are hoping SHEC would >>>> be merged into Hammer because that would have an apparent advantage >>>> from a viewpoint of maintenance support. >>>> >>>> As for the thread safety issue, we totally agree with you and we are trying to solve it. >>>> We will complete it in a few days and we'd like to ask you to review >>>> it again so that it might be in time for Hammer's feature freeze (v0.93). >>>> >>>> Best regards, >>>> Takeshi Miyamae >>>> >>>> -----Original Message----- >>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >>>> Sent: Wednesday, January 14, 2015 3:03 AM >>>> To: Miyamae, Takeshi/宮前 剛 >>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>> 鷹詔; >>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>> Subject: Re: Deadline of Github pull request for Hammer release >>>> (question) >>>> >>>> Hi, >>>> >>>> On 13/01/2015 18:05, Miyamae, Takeshi wrote: >>>>> Hi Loic, >>>>> >>>>> Thank you for your quick review. >>>>> >>>>>> Could you reference the jerasure files (they are in the jerasure >>>>>> plugin already) instead of including them in your patch ? >>>>> >>>>> We had used the reference of jerasure v2.0 files before, but >>>>> changed to including v1.2 files after the patent issue. >>>> >>>> If you are refering to http://erasure-code-patents.xyz/, it can safely be ignored. >>>> >>>>> However, we can restore the reference right away if we are needed. >>>>> >>>>>> In ::prepare you reuse the matrix, if possible. If your intention >>>>>> is to improve performances, you should consider using the same approach as the isa plugin. >>>>> >>>>> We have found a performance issue caused by redundant >>>>> initializations of the module, and solved it by caching the initialized variables in memory. >>>>> If that is the same approach as isa-plugin you mentioned, we also >>>>> do not have the same issue any more. >>>> >>>> The ISA plugin approach is thread safe, that's the important part. >>>> >>>>>> I'll have more comments once you have unit / functional tests >>>>> >>>>> We do have the those unit/functional tests, but the server is unreachable at this moment. >>>>> Please let us upload them to the repository tomorrow. >>>> >>>> Great. >>>> >>>>>> Regarding your initial question: I think it is too late for the Hammer release. >>>>> >>>>> We are so disappointed to hear that, but we must apologize for the late request. >>>>> If there would be any chance that SHEC be pulled into hammer, we >>>>> are very happy to conduct all the necessary tests by ourselves. >>>>> Please let us know the detailed schedule if this is a feasible option. >>>> >>>> Note that since this is a plugin it will be easy for someone to install it on a Hammer cluster, simply by copying the plugin files to each OSD / MON and restarting them. If there is some kind of urgency for you to be able to install this new plugin, I would advise making a package that contains just this file, restart the OSD once installed and recommend that the installation is done on all machines of the cluster (because every OSD / MON need to know about the plugin). >>>> >>>> Cheers >>>> >>>>> Best regards, >>>>> Takeshi Miyamae >>>>> >>>>> -----Original Message----- >>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >>>>> Sent: Tuesday, January 13, 2015 9:46 PM >>>>> To: Miyamae, Takeshi/宮前 剛 >>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>>> 鷹詔; >>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>> (question) >>>>> >>>>> Hi, >>>>> >>>>> It's a great start :-) A few comments: >>>>> >>>>> Could you reference the jerasure files (they are in the jerasure plugin already) instead of including them in your patch ? >>>>> >>>>> In ::prepare you reuse the matrix, if possible. If your intention >>>>> is to improve performances, you should consider using the same >>>>> approach as the isa plugin. See >>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code/ >>>>> i >>>>> s >>>>> a >>>>> /ErasureCodePluginIsa.cc#L36 which is and independent type >>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code/ >>>>> i s a /ErasureCodeIsaTableCache.h with only one instance and is >>>>> populated as needed and protected by a lock to make it thread safe : >>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code/ >>>>> i >>>>> s >>>>> a >>>>> /ErasureCodeIsaTableCache.h#L101 >>>>> >>>>> I'll have more comments once you have unit / functional tests ( similar to http://workbench.dachary.org/ceph/ceph/blob/giant/src/test/erasure-code/TestErasureCodeJerasure.cc ). >>>>> >>>>> Regarding your initial question: I think it is too late for the Hammer release. But from what I read it looks like we'll be able to merge in the early stages of the next release and that will give us time to properly test it. >>>>> >>>>> Cheers >>>>> >>>>> On 13/01/2015 11:34, Miyamae, Takeshi wrote: >>>>>> Hi Loic, >>>>>> >>>>>> I'm so sorry. The following is the correct repository. >>>>>> >>>>>> https://github.com/t-miyamae/ceph >>>>>> >>>>>> Best regards, >>>>>> Takeshi Miyamae >>>>>> >>>>>> -----Original Message----- >>>>>> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] >>>>>> Sent: Tuesday, January 13, 2015 7:26 PM >>>>>> To: Miyamae, Takeshi/宮前 剛 >>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>>>> 鷹詔; >>>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>>> (question) >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 13/01/2015 11:24, Miyamae, Takeshi wrote: >>>>>>> Hi Loic, >>>>>>> >>>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin. >>>>>>> >>>>>>> Thank you for your advices. >>>>>>> We have uploaded our latest codes to the following folk repository for an early review. >>>>>>> >>>>>>> https://github.com/miyamae-takeshi/multiple-shec >>>>>> >>>>>> It's 404 ? Is it a private repository maybe ? >>>>>> >>>>>>> >>>>>>> SHEC is located in src/erasure-code/shec directory. >>>>>>> >>>>>>> We are verifying SHEC's advantages on our Ceph cluster. It takes a little bit more. >>>>>>> Would you please start the review before that? >>>>>>> >>>>>>> Best regards, >>>>>>> Takeshi Miyamae >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic >>>>>>> Dachary >>>>>>> Sent: Wednesday, January 7, 2015 12:52 AM >>>>>>> To: Miyamae, Takeshi/宮前 剛 >>>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>>>>> 鷹詔 >>>>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>>>> (question) >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 06/01/2015 12:49, Miyamae, Takeshi wrote: >>>>>>>> Dear Loic, >>>>>>>> >>>>>>>> I'm Takeshi Miyamae, one of the authors of SHEC's blueprint. >>>>>>>> >>>>>>>> Shingled Erasure Code (SHEC) >>>>>>>> https://wiki.ceph.com/Planning/Blueprints/Hammer/Shingled_Erasur >>>>>>>> e >>>>>>>> _ >>>>>>>> C >>>>>>>> o >>>>>>>> d >>>>>>>> e >>>>>>>> _(SHEC) >>>>>>> >>>>>>> The work you have done is quite impressive :-) >>>>>>> >>>>>>>> We have revised our blueprint shown in the last CDS to extend >>>>>>>> our erasure code layouts and describe the guideline for choosing SHEC among various EC plugins. >>>>>>>> We believe the blueprint now answers all the comments given at the CDS. >>>>>>> >>>>>>> Great. >>>>>>> >>>>>>>> In addition, we would like to ask for your advice on the >>>>>>>> schedule of our github pull request. More specifically, we would >>>>>>>> like to know its deadline for Hammer release. >>>>>>>> (As we have not really completed our verification of SHEC, we >>>>>>>> are wondering if we should make it open for early preview.) >>>>>>> >>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin. >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> Takeshi Miyamae >>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More >>>>>>>> majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Loïc Dachary, Artisan Logiciel Libre >>>>>>> >>>>>> >>>>>> -- >>>>>> Loïc Dachary, Artisan Logiciel Libre >>>>>> >>>>> >>>>> -- >>>>> Loïc Dachary, Artisan Logiciel Libre >>>>> >>>> >>>> -- >>>> Loïc Dachary, Artisan Logiciel Libre >>>> >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> >>> N r y b X ǧv ^ ){.n + z ]z {ay ʇڙ ,j f h z w >> j:+v w j m zZ+ ݢj" !tml= >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> >> N r y b X ǧv ^ ){.n + z ]z {ay ʇڙ ,j f h z w > j:+v w j m zZ+ ݢj" !tml= >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature