Monday, March 19, 2012

Advanced schema design question for Experts. Please help. Thanks

QUESTIONs:What schema is the best for high speed search for a classified web application?

Is our schema design looks OK? It is a STAR schema and will be used for OLTP type app. Is this OK? or Are we missing something?Please let me know if you need more diagramatic description.

BACKGROUND:Our group is making a classified website (like classified.yahoo.com) where people can place online ad to sell items.like cars, computers, electronics etc. Users will fill out webform for each category(car or computers) with allattributes of the item forsell to post an add.

Main 3 operations the web users will perform:Quick Searching(most frequent): category=car, subcategory=sedan, country=USA city=LosAngeles Zip=empty

Advanced Search(less frequent): User can include all fields\attributes or then can user a subset to query to do advanced search like :

category=car, subcategory=sedan, country=USA, city=Los Angeles, Year= in(00, 03), transmission=Auto, engine=V6, Maker=Honda, Model=Accord, color=Red and Price < 10000

category=car, subcategory=SUV, country=USA, city=NY, Year= in(99, 00, 03), transmission=Auto, Maker=Toyota, and Price between 11000 and 14500

Insert(least frequent): By filling out web form. For Car, the form will have different dimensions\attributes (year, make, model, transmission, mileage, color, price etc)

Current Schema design:Set of core dimension\lookup tables: stores core\common attributes for lookups eg. status={open, new, expired}, country={USA, Canada, India,...}

One custom Lookup table: for all custom attribute lookup: transmission{auto, manual}, engine type{V4, V6} for car, processor{PII, PIII, PIV), RAM(512MB, 1GB, 2GB} for Computer & so

3 Fact Tables:Main fact\Pivot table that stores all sell common attributes eg. price, title, year_made, post_date, expire_date, user_name, description etc.One fact table to store custom string, int, float, date field values of ads.One fact table to store custom dropdown field selection values

Concerns and issues:Looking at the schema, it seems to be a STAR schema with multiple fact tables where all core lookup tables connected to the main Pivot table andcustom lookup table connected to the 2nd and 3rd fact table.

Quick search only queries the Pivot fact table. While Advanced search query requires to join 3 fact tables. Both query requires to join 3 fact tables with alldimension tables(15 to 20 each having avg of 20 values) to get the look up names so that users sees text instead of ids. Search speed is the Main concerns. Insertion\update speed doesn't matter that much as that is less frequently done.

jennifer7290 wrote:

QUESTIONs:What schema is the best for high speed search for a classified web application?


Is our schema design looks OK? It is a STAR schema and will be used for OLTP type app. Is this OK? or Are we missing something?Please let me know if you need more diagramatic description.



Your model ("3 fact tables") is quite possibly the worst way to do this.
Let's think about it:
- You know what you're selling - Cars, Computers, etc.
- You know what attributes those entities have (Engine Type, Processor Type, etc).
- Users search for entities, not attributes ("give me everything witha V6 and a P4").
So why go with such non-orthagonal design? Subclass your entities:
Ads := ( Ad_Num, Ad_Type, Price, Start_Date, ...)
Ad_Type IN ('Auto', 'Computer', 'Boat', 'Service', ...)
Auto_Ad_Attributes := ( Ad_Num, Engine_Size, Color, ...)
Computer_Ad_Attributes := ( Ad_Num, Processor_Speed, ...)
...
Go with EAV and flush your data integrity down the drain and watch how absurdly slow your clasified system runs when people start using it.

|||Alex,
Thanks for reading it. What is orthogonal design concept? Is it database term or OOP term?
What is EAV stands for? How can I see some example design of your proposed approach or some tutorials, books will be really helpful.
Again, our emphasis is to track all classified information in a webbased tool with requirement of complex, dynamic query with HIGH speed in search.
Hope to hear from you.
Regards|||

I did google on it. It sounds like a bad design approach to me. The reasons are too many entities, so too many joins can't be avoided like the STAR schema.
How does your EAV performs better than STAR for classified system for a system with 20 metadata\entity\lookup tables??
HOW DOES YOUR MODEL PROVIDE HIGH SPPED IN SEARCH? Query speed depends on number of joins which can be done through denormalization.
In my case I actually have 1 fact table and 2 mapping intersect table (I mentioned as fact table).
Please see the link below:
Regards

http://ycmi.med.yale.edu/nadkarni/eav_CR_contents.htm
Drawbacks of EAV/CR

  • Considerable up-front programming (wheel reinvention) is needed to do the tasks that a conventional architecture would do automatically. However, such programming needs to be done only once, and availability of generic EAV tools could remove this limitation.
  • EAV design is less efficient than a conventional structure for retrieving data in bulk on numerous objects at a time. (For object-at-a-time retrieval, such as through a Web-based browsing interface, the volume of data is small enough that the difference is not noticeable.)
  • Performing complex attribute-centric queries is both significantly less efficient as well as technically more difficult. This needs a query generator. However, most queries on scientific databases are relatively straightforward, and directed toward specific objects of interest.
  • For schemas that are relatively static and/or simple (e.g., databases for business applications, such as inventory or accounting), the overhead of EAV design exceeds its advantages.
  • If particular classes of data will be used across multiple databases in different domains, and/or there is a large amount of existing code that manipulates them, it should probably be left in its original conventional form. This is the reason why we have chosen to store bibliographic citations (that are not in external databases) conventionally. We similarly store certain genetics-related data conventionally because of a large body of code (from other databases previously created by our group, such as PhenoDB ) that performs computations such as Hardy-Weinberg frequencies and tests of significance.|||


    jennifer7290 wrote:

    Alex,
    Thanks for reading it. What is orthogonal design concept? Is it database term or OOP term?
    What is EAV stands for? How can I see some example design of your proposed approach or some tutorials, books will be really helpful.

    Again, our emphasis is to track all classified information in a webbased tool with requirement of complex, dynamic query with HIGH speed in search.
    Hope to hear from you.
    Regards

    EAV = Entity-Attribute Value, a totally unstructured design that involves Entities defined with Attribute/Value pairs:
    EAV := (Entity, Attribute, Value)
    { 'CAR#342', 'Engine', 'V6' }
    { 'CAR#342', 'Color', 'Red' }
    { 'PC#4555', 'CPU', 'Pentium 4' }
    { 'PC#998', 'OS', 'Windows' }

    This is effectively what you are doing with your "fact" tables, except you have three "fact" tables for some reason I really don't understand.

    The design I mentioned is a very basic one. You have a table containing your base entities (advertisements) and tables containing subclass-specific attributes like Engine, Color, etc. You then make views for each type of entitiy:

    Automobile_Ads :=
    SELECT AD.Ad_Num, AD.title, AD.year_made, AD.post_date,
    AD.expire_date, AD.user_name, AD.description , ...,
    AUTO.Make, AUTO.Model, AUTO.Miles, ...,
    FROM Advertisements AD
    INNER JOIN Automobile_Advertisement_Attributes AUTO
    ON AD.Ad_Num = A.Ad_Num

    Find me all Ford Makes with a v6:
    SELECT Make FROM Automobile_Ads WHERE Make='Ford' AND Engine='V6'

    You would be very saddened to see how complex (and rediculously slow) such a query would be in an EAV table.

    It's a very easy design to do: figure out all the entities (what you will allow ads for), then figure out all of the attributes of those entities. For each entity, create an attributes table as desribed above and a view described above.

    |||

    jennifer7290 wrote:

    I did google on it. It sounds like a bad design approach to me. The reasons are too many entities, so too many joins can't be avoided like the STAR schema.
    How does your EAV performs better than STAR for classified system for a system with 20 metadata\entity\lookup tables??


    This is what happens when I POST before I READ. I misunderstood your design as EAV ... now that I read it again, I am a little lost ... could you elaborate by showing some tables ?
  • No comments:

    Post a Comment