Why requirement for ID field when building Decision Tree?

Discussion in 'microsoft.public.sqlserver.datamining' started by Tim Manns, Jul 1, 2004.

  1. Tim Manns

    Tim Manns Guest

    Hi,

    I'm trying to find some information that explains the math
    or programing behind the requirement for an ID field (key
    column) when building a Decision Tree model in Analysis
    Services.

    Many Decision Trees (ie, CART, C5.0) can build a model
    without specifying a unique ID field (key column), so I'm
    curious why the Decision Tree in Analysis Server has this
    as a necessary requirement.

    I suspect it is a bug. I can understand the requirement
    if you are using multiple tables or OLAP, but not if you
    are using a single flat table.

    Am I missing something?
     
    Tim Manns, Jul 1, 2004
    #1
    1. Advertisements

  2. Yes, you're right. Microsoft decision trees and clustering algorithm doesn't
    use the key column in learning process as far as it's a flat model. However,
    if you have a nested table column, the key is going to be used to pivot rows
    for the nested table to form a hierarchical case structure. Also, some
    algorithms (e.g., Time series that is available in SQL 2005 beta1) may use
    the key column for internal model identification (e.g., a time-tube).
    Sometimes, we also want to expose training cases (or sample of cases) that
    belong to a specific node in a tree (e.g., drill-through feature in SQL 2005
    beta1). You would need a key in the sample in case you want to join this
    drill-through cases with other data that you might have in your source
    database.

    The mining model structural constraints were designed not just for decision
    trees, but for other algorithms and many other application scenarios as
    well. In fact, in SQL 2005 beta1, we allow multiple models with different
    algorithms to share one mining structure. In general, one of the principles
    we have in DMX is to separate algorithm details from logical modeling so
    that application programs doesn't have to be tied too much into algorithm
    specifics. It's the similar philosophy that SQL world invented decades ago,
    so called, data independence, when they introduced relational model.
     
    Peter Kim [MS], Jul 1, 2004
    #2
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.