Re: Database designpattern - product feature

Roxanne Reid-Bennett <rox@xxxxxxxxxxx> · Wed, 03 Jun 2015 14:04:28 -0400



    On 6/3/2015 2:50 AM, Adrian Stern
      wrote:

    
        Hi William, thanks for joining the conversation.
        

        1) We do hope for constraints since a connection to an ERP
          system is possible in the future. We want to plan ahead. 
        

        2) As for the subclass approach: I would need about 30
          subclasses and it will get really hard to add new products
          since a change in the database will be necessary each time.
          That's why we want a more generic approach.
      
    
    At the root of your description it appears to me that you are
    choosing essentially an EAV design pattern.  Constraints and case
    specific validation become difficult (or impossible) to consistently
    enforce.  I have personal experience in cleaning up a system that
    used this *exact* pattern (product -> product attribute).
    Different developers approached updating information in different
    ways.  They also chose to store the attributes as text to avoid the
    text vs number issue.  However, they failed to force any validation
    and any hint of attempting to prevent duplicates.  That destroyed
    hope of supporting an underlying business need to search for
    specific values during data analysis.  (think of how many different
    ways you can misspell "poughkeepsie" - but hey... it's "just a
    name")

    
    I inherited the results of poor development controls and poor
    control over the weakness of the design - e.g. validation... and the
    cleanup has been long, and painful.

    
    I think you should evaluate your unease with having to update the
    database on release (potentially many times) carefully for what it
    is and why you have it.  [I'm not saying it is invalid - just know
    why you have it]  Because no matter how well you design your system
    - databases evolve.  Manage that.  

    
    Anybody can muck up part of a project and cause garbage - but
    speaking from experience... this design pattern really encourages
    it.  If you choose to use it - then you HAVE to control where and
    when inserts/updates are done and be very careful with specifying
    how validation is to be done to the entire development team (and
    potentially the users)... and then review the code (and/or data)
    regularly.

    
    wide tables/sparse fill on the columns ... I haven't tried to
    calculate the overhead on this... but disk space is generally
    considered to be cheap.  [that doesn't mean your case wouldn't be
    criminally wasteful]  Choosing 1 wide table or 30 sub-tables to deal
    with detail data. I don't know how that directly effects Postgres'
    performance.... if you choose 30 tables ... meh... at least you'll
    know exactly where your data is - and exactly what is allowed for
    each and every variant.  Remember to enforce a consistent guessable
    naming convention.

    
    All that said - there are reasons to use this pattern to gain the
    power of the dynamically expandable configuration of allowed
    values.  I just see the use cases where the gain (flexibility) is
    worth the pain (validation control) to be few and far between.

    
    just my $0.01  

    
    Roxanne

    
        Maybe I don't understand you right, because of the language
          barrier. Can you provide me a link to a subclassing example?
        -> https://docs.djangoproject.com/en/1.8/topics/db/models/#multi-table-inheritance
          ?
        

        ORM is a given in my case. This is not a high performance
          application. 
      
      
            Freundliche Grüsse
              

              Adrian Stern
              unchained - web solutions
              

              adrian.stern@xxxxxxxxxxxx
              +41 79 292 83 47
            
          
        On Tue, Jun 2, 2015 at 5:35 PM, William
          Dunn <dunnwjr@xxxxxxxxx>
          wrote:

          
            Hello Adrian,
              

              May I ask why you need a non-standard model? By
                standard models I mean the following:
              

              1) When you don't need to have subclass specific
                database constraints: All subclasses in the same table,
                subclasses that do not have an attribute have that
                column null. This has the best performance because no
                joins are required when querying both superclass and
                subclass attributes, and all the data for an object will
                be in the same block on disk. The disadvantage of this
                is that you cannot enforce constraints, such as
                not-null, on subclass specific attributes columns
                because the constraints would also be applied to the
                superclasses. If you can ensure that your application is
                the only way data can be inserted those constraints will
                naturally be enforced there.
              

              2) When you need subclass specific database
                constraints: Use an ORM such as Django's ORM or
                SQLAlchemy which has one table with the superclass
                attributes and a table for each subclass with their
                subclass specific attributes. This is slower because
                joins will be needed and the data for an object will be
                in 2 different blocks of disk but it allows you to
                enforce constraints within the database which will be
                checked whenever any application tries to insert values.
                There is a lot of complexity added because there will be
                so many small tables and indexes but the ORM takes care
                of that for you.
            
            
                          Will J. Dunn
                        
                        willjdunn.com
                      
                    
                  On Mon, Jun 1, 2015 at 10:35
                    AM, Adrian Stern <adrian.stern@xxxxxxxxxxxx>
                    wrote:

                    
                        Hi, I'm new

                        
                        I've been working as the sole administrator
                          of various postgresql projects for a while
                          now. All of which where django projects. 
                        Since a new project is starting and we've
                          found the need for a more generic approach I
                          would like to ask a few questions. 
                        

                        I would like to implement a pattern similar
                          to the product feature pattern explained in
                          the silverstone book - the data model resource
                          book vol 1. It is simply explained. There is a
                          Table PRODUCT holding the fields all the
                          products share, then there is the table
                          PRODUCT_FEATURE, both of them in a “many to
                          many“ relationship.
                        

                        PRODUCT <--- m -------- n --->
                          PRODUCT_FEATURE (a table in between of course)
                        

                        PRODUCT_FEATURE --> PF
                        PRODUCT --> P
                        TABLE IN BETWEEN --> TIB
                        

                        PF defines the feature Type while P stands
                          for the product the feature is applied to.
                          Some of these PF can have values of different
                          types (text, numbers, floating, blob, ...)
                          which would be applied to TIB. 
                        

                        I don't like the idea of having numerous
                          empty fields prepared in TIB, just to store
                          occasional values of different types,
                          therefore I need to specialize those TIB
                          Values. 
                        

                        Now how would I do That?  
                        

                        I could create some tables solely for the
                          means of holding [NUM], [TEXT], [BLOB], [ETC]
                          and reference them with the TIB PK. When using
                          them I could create a view TIBV containing all
                          of [NUM, TEXT, BLOB, ETC] in the same column
                          called Value, and join it with TIB to get the
                          value of a PF. 
                        

                        But is this a good idea?
                        Is there a better way?
                        

                        Also, I would have to create a pivot table
                          in order to list all the products with all the
                          features. As this is implemented in C (afaik)
                          I suppose it is rather fast or at least fast
                          enough, but I do not actually know. What I
                          know is, there are about 30 Product Types and
                          around 50 possible product features. One
                          product can have up to approximately 25 PF but
                          are mostly around 5 to 10. 
                        

                        Do you think a pivot table is a good idea? 
                        What alternative do i have?
                        

                        There is room for caching since the dataset
                          is not updated too often. 
                        

                        regards, adrian
                      
                    
    -- 
[At other schools] I think the most common fault in general is to teach students how to pass exams instead of teaching them the science.
Donald Knuth