Microsoft Decision Tree: how it works exactly?

Discussion in 'microsoft.public.sqlserver.datamining' started by Peter, Sep 26, 2003.

  1. Peter

    Peter Guest

    Hi,

    I am using the built in Microsoft Decision Tree to perform
    some data mining task on my Analysis Server (SP3). I have
    some difficulty to understand how it pickes the node and
    how it splits and terminates, etc.. I'd really like to
    know how the algorithm works.

    Another question is: Is there anyway to control the
    splitting and terminating conditions from the Analysis
    Manager?

    The last question is that I read the Analysis Manager
    (SP3) supports the third party's algorithm plug in. Does
    that mean I can develop my own mining algorithm and call
    it from the analysis manager? If so, where can I find ways
    to do that?

    It might be a lot to ask. I really appreciate any input.

    Thanks,

    Peter
     
    Peter, Sep 26, 2003
    #1
    1. Advertisements

  2. You can find many of your answers in the FAQ at
    http://groups.msn.com/AnalysisServicesDataMining

    In particular you can control the way a tree splits and how deep it is with
    the SPLIT_METHOD and COMPLEXITY_PENALTY parameters. At the above website,
    there is a sample AM plug-in that provides a user-interface for setting
    algorithm parameters.

    I believe you can also find the link to the sample OLEDB for Data Mining
    provider that you can use as a basis for your own algorithms
     
    Jamie MacLennan \(MS\), Sep 29, 2003
    #2
    1. Advertisements

  3. Peter

    Peter Guest

    Peter, Sep 29, 2003
    #3
  4. Jamie MacLennan \(MS\), Sep 30, 2003
    #4
  5. Peter

    Peter Guest

    Jamin,

    Thanks a lot for the reply. The link works just fine.

    I have downloaded the following package

    DataMiningAddIns.exe

    and unzipped it. According to the readme.txt, I closed the
    running application Analysis Manager and ran the

    DataMiningAddIn.reg

    After that, I got the message saying that the registration
    is successful.

    Then, I started the Analysis Manager again. But here is
    the problem, when I right click the mining models of one
    database, say "Mushrooms", I can not see the "Advanced
    Model Properties" from the list.

    When I right click on server name and select properties,
    the "Add-ins" tab shows that "Mining model properties" as
    available Add-ins but there is also a yellow sign
    saying "this tab applies to the local computer only".

    I am not sure where went wrong. Any suggestions?

    I am using SQL Server 2000 (SP3), Analysis Manger SP3 on
    Windows 2000, FYI.

    Regards,

    Peter
     
    Peter, Sep 30, 2003
    #5
  6. Peter

    Peter Guest

    Jamie and Peter,

    Thanks a lot for the reply. The link works just fine.

    I have downloaded the following package

    DataMiningAddIns.exe

    and unzipped it. According to the readme.txt, I closed the
    running application Analysis Manager and ran the

    DataMiningAddIn.reg

    After that, I got the message saying that the registration
    is successful.

    Then, I started the Analysis Manager again. But here is
    the problem, when I right click the mining models of one
    database, say "Mushrooms", I can not see the "Advanced
    Model Properties" from the list.

    When I right click on server name and select properties,
    the "Add-ins" tab shows that "Mining model properties" as
    available Add-ins but there is also a yellow sign
    saying "this tab applies to the local computer only".

    I am not sure where went wrong. Any suggestions?

    I am using SQL Server 2000 (SP3), Analysis Manger SP3 on
    Windows 2000, FYI.

    Regards,

    Peter
     
    Peter, Oct 1, 2003
    #7
  7. You also need to register the DataMiningAddIns.dll.

    --
    Raman Iyer
    SQL Server Data Mining
    [Please do not send email directly to this alias. This alias is for
    newsgroup purposes and is intended to prevent automated spam. This posting
    is provided "AS IS" with no warranties, and confers no rights.]
    ..
     
    Raman Iyer [MS], Oct 2, 2003
    #8
  8. Peter

    Peter Guest

    It is now working! Thank you all, Peter, Jamie and Raman.

    I checked out the two papers (shown in the following)
    listed on the FAQ in answering the question "Where do I
    get the details of the two algorithms? " However, they
    don't seem to address very clearly what the creteria is
    used to stop the splitting, which node to split and the
    discretization of continuous values. Is there any other
    document better addressing these issues? I know there are
    many academic papers talking about these issues, but my
    main concern is how they are handled in Microsoft Decision
    Tree.

    Papers I read:
    =====================================================
    - Correlation counting:
    Surajit Chaudhuri, Usama M. Fayyad, Jeff Bernhardt,
    Scalable Classification over SQL Databases. ICDE 1999: 470-
    479
    Found in
    http://ftp.research.microsoft.com/Users/surajitc/icde99.pdf

    - The default scoring methods (Bayesian Dirichlet
    Equivalent with Uniform prior):
    David M. Chickering; Dan Geiger; David Heckerman,
    Learning Bayesian Networks: The Combination of Knowledge
    and
    Statistical Data, MSR-TR-94-09, 1994
    Found in
    http://www.research.microsoft.com/scripts/pubdb/pubsasp.asp
    ?recordID=81
    =======================================================

    Another thing is that when I try to get information on how
    to plug-in third party's algorithm from the following link

    http://www.microsoft.com/sql/techinfo/BI/2000/dmproviderswp
    ..asp

    I got the "Page not found" error. Did I miss something?

    Thanks for any input.

    Peter
     
    Peter, Oct 2, 2003
    #9
  9. The tree stops splitting when it sees no split gives better split score
    any longer. The research paper is describing how we calculate
    the split score. In terms of split method, we have three different
    methods implemented; simple BINARY, COMPLETE, BOTH.
    Simple BINARY produces split condition like Hobby=golf,
    Hobby!=golf while COMPLETE produces Hobby=golf,
    Hobby=tennis, and so on. BOTH will take the best out of the
    two methods for each split. The split method can be specified
    using parameter, SPLIT_METHOD.

    There is also a parameter, COMPLEXITY_PENALTY that
    controls the tree depth by penalizing the split score.

    Continuous inputs are handled differently from discrete.
    For each node to split, we collect a sample cases for candidate
    continuous inputs and find the best cut-points in some way.
    So, it is different from DISCRETIZED attribute.

    Let me know if you have more questions.
    --
    Peter Kim
    This posting is provided "AS IS" with no warranties, and confers no rights.

     
    Peter Kim [MS], Oct 2, 2003
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.