Thursday, June 5, 2014

ODM Integration with SPSS Predictive Analysis Suite - Part 1: PMML Import

There are a few ways of integrating the SPSS Predictive Analytics Suite with ODM. To get started, it is required to install the SupportPac (LB02).

The SupportPac provides two features that support the usage of business rules and Predictive Analytics together:
  • Part 1: Import a decision tree model via PMML (Predictive Model Markup Language) and generate an Operational Decision Management decision tree at design time (discussed in this article)
  • Part 2: Reference predictive scores within business rules and obtain those scores at runtime from the SPSS Scoring Service (not discussed in this article)
In this article I am going to describe step-by-step the instructions to utilize the ODM capabilities of PMML Import of Decision Tree models. Part 2 will be a separate article describing ODM-SPSS Scoring Service approach. Both approaches require an installation of IBM WebSphere Operational Decision Management Integration with the SPSS Predictive Analytics Suite SupportPac.
To install the IBM WebSphere Operational Decision Management Integration with the SPSS Predictive Analytics Suite SupportPac, you must:
  1. Unzip the SupportPac deliverable in the WebSphere Operational Decision Management installation directory.
  2. Install the predictive analytics features from Rule Designer.
PMML is the leading standard for statistical and data mining models. It uses XML to represent mining models, so that models can be shared. In other words, using PMML, models can be developed on one system using one application and deployed on another system using a different application. Models can be created and PMML can be generated and exported using the SPSS Modeler.

The PMML import approach works for Decision Tree models. Decision Tree models are produced by data mining algorithms (such as CHAID, C&RT, ID3, C4.5/C5.0) that identify various ways of splitting a dataset into branch-like segments, forming an inverted tree that starts with the root node at the top of the tree. Decision Tree models are used frequently in the data mining community for classification and prediction as they are easy to understand, easy to use, support both quantitative and qualitative measurements, and are very robust. Data mining workbenches, like the SPSS Modeler, provide rich toolsets for creating and validating Decision Tree models.

The PMML import feature focuses on the Decision Tree model. After your Decision Tree model is exported from a modeling tool to a PMML file, you can import it into a decision tree.

The primary difference between a Decision Tree model, as used in the data mining community, and a decision tree is that the decision tree has actions attached to the leaf nodes while the Decision Tree model usually has some sort of predicted variable or classification attribute specified for each node. In other words, a Decision Tree model can identify the business rules for classifying and predicting a specific variable, whereas the decision tree can actually execute those business rules along with the appropriate actions at run time.

1. Importing PMML you have a choice to either generate the BOM elements the model is using or to map existing BOM elements to the fields in the model. 

2. The PMML import creates a Decision Tree and the BOM elements used in the model. You may get B2X errors and warnings until you create the corresponding XOM class.

3. At this point you can treat the Decision Tree as other Decision Trees (if any) or rule artifacts created in Rule Designer. You are allowed to modify/edit the DT. Best practices around ruleflow orchestration suggest that each decision tree should be contained within its own rule task. 

Note that an imported decision tree currently has no life-cycle link to the PMML file. Consequently, if you change the PMML model itself, you will have to repeat the import/modification process.

Artur Sahakyan is an Associate Consultant at Prolifics specializing in IBM WebSphere Operational Decision Management (v5.xx - v8.xx). Artur has a strong background in mathematics and probability/statistics. He also has profound knowledge of IBM Business Process Manager, IBM Integration Bus (IIB v9), IBM WebSphere MQ (v7), IBM SPSS Modeler, IBM SPSS Statistics, Java, C++, C.