Random Decision Forest

Command: Image Analysis > Classification > Random Decision Forest > Designer

Random Decision Forest (RDF) based classifiers are one of the most often used models for classification and are comparable in performance to PLS discriminant analysis. In order to create an RDF classifier you first have to define both a set of spectral descriptors and a set of training data.

How To:
  1. Load the spectral desciptors by clicking the button.
  2. Load the training set by clicking the button. Hint: If you did not specify a training set so far, you can do so by clicking the button in the main window and prepare your set, before you proceed.
  3. Set the R parameter1) and the number of trees
  4. Optionally select a pixel mask (normally not required)
  5. Click the "Calculate" button. The modelling starts immediately and the calculated decision forest is applied to the currently loaded image.
  6. After the calculation has finished you should check the classification results for all classes. Be aware of the fact that RDFs cannot be excessively controlled by their parameters (the R parameter and the number of trees won't have much influence on the results). In the case of poor classification results you should adjust the spectral descriptors and/or the training data set.
  7. Save the trained classifier if it fits your requirements.

Error types:

  • RelClsError: Relative classification error (percent of incorrectly classified cases)
  • AvgCE: Average cross-entropy (in bits per element)
  • RMSError: root mean square error when estimating posterior probabilities
  • AvgError: Average error when estimating posterior probabilities
  • AvgRelError: Average relative error when estimating posterior probability of belonging to the correct class

The upper part of the window shows the list of descriptors and a table containing the training data. The corresponding data points of the training dataset are depicted in the top right image. This image shows the currently selected descriptor. The lower half of the window shows the results of the classification on five tabs:

Tab Description
Classification Results The plot on the left hand side depicts the confusion matrix for each class. If the actual and predicted classification coincide, the data point in question appears either in the upper left if the result is positive/positive or the lower right part of the matrix if the result is negative/negative. If the classification contradicts itself, the data point either appears in the lower left part or the upper right part, marked with a red x. The number inside each cell corresponds to the number of assigned data points. The plot on the right hand side depicts the classification for the currently selected class in a color coded form.
Residuals This graph depicts the residual values for all test data points.
R Scan If you are not sure about the optimum R value1) you can scan the allowed range of R values and display the results as a diagram. In many cases an R value of 0.5 delivers good results.
Tree Scan The tree scan allows to find out how many trees should be used for the decision forest. For most problems you need between 50 and 100 trees.
Variable Importance This tab provides information about the importance of the used descriptors to detect a selected class. See the section on variable importance for details.
Details The details contain a numeric summary of the classification results. The classification errors, the variable importance, the actual and estimated traning data and the scans for the R-value and the number of trees are reported as text tables.

Hint: The following video shows how to create a classifier based on random forests.

1) The R value specifies the percentage of the training set used to build the individual trees. The R parameter may assume values between 0 and 1.

Last Update: 2018-Sep-17