Algorithm Comparison - CSCI 447

CSCI 447
Machine Learning
Spring 2019

Computer Science & Software Engineering

Schedule | Assignments | Course Syllabus | Moodle (447) | Moodle (547)

LEARNING ALGORITHM MATRIX

Algorithm	Input Data	Output Type	Problem Type	Benefits	Drawbacks
Linear Regression	Numeric	Numeric continuous	Regression / curve fitting / prediction	Small, fast, easy simple	Assumes residuals fit normal distribution, data are iid, variance constant
Logistic Regression	Numeric	Categorical	Curve fitting / prediction	Small, fast, easy simple	Assumes data are iid
Clustering	Unlabeled anything	Categorical / buckets	Unsupervised	Flexibe; can be used to label data	Will always output clusters, but they might not be meaningful; sensitive to initial random placement of centroids; assume some categorization exists
Nearest Neighbor	Labeled anything	Categorical / labeled	Heuristic prediction	No learning needed; straightforward	Doesn't scale well, curse of dimensionality
Deep Networks	Labeled Numeric	Prediction - categorical, probability, numeric prediction	Supervised, high dimensional	Few assumptions; can fit complex anything	Prone to overfitting; doesn't react to change in operation; not as simple to implement (hyperparameters); hard to explain; solution not optimal
Convolutional Networks	High dimensional data (images, etc.)	Prediction - categorical, probability, numeric prediction	Classification; supervised high dimensional	Learns important features / preprocessing; don't need to handcraft features	Prone to overfitting; doesn't react to change in operation; not as simple to implement (hyperparameters); hard to explain; solution not optimal
Recurrent Networks	Sequence data	Prediction - categorical, probability, numeric prediction	Time/space sensitive	No independence assumptions; memory of previous input	Prone to overfitting; doesn't react to change in operation; not as simple to implement (hyperparameters); hard to explain; solution not optimal; recurrence difficult to program
Bayesian Networks	Mostly categorical, but can be numeric	1) Model, 2) Probabilities for many variable combinations, 3) Conditional independencies	Many	Generative; easy to understand	Could be expensive based on number of variables
Genetic Algorithms	Data	Optimization / design solutions	Optimization / design	Generative; doesn't use gradient; novel solutions	Very expensive
Decision Trees	Labeled anything	Prediction / clasification, numeric	Simple	Easy to build, use, understand	Prone to under- and over-fitting

Algorithm

Input Data

Output Type

Problem Type

Benefits

Drawbacks

Linear Regression

Numeric

Numeric continuous

Regression / curve fitting / prediction

Small, fast, easy simple

Assumes residuals fit normal distribution, data are iid, variance constant

Logistic Regression

Numeric

Categorical

Curve fitting / prediction

Small, fast, easy simple

Assumes data are iid

Clustering

Unlabeled anything

Categorical / buckets

Unsupervised

Flexibe; can be used to label data

Will always output clusters, but they might not be meaningful; sensitive to initial random placement of centroids; assume some categorization exists

Nearest Neighbor

Labeled anything

Categorical / labeled

Heuristic prediction

No learning needed; straightforward

Doesn't scale well, curse of dimensionality

Deep Networks

Labeled Numeric

Prediction - categorical, probability, numeric prediction

Supervised, high dimensional

Few assumptions; can fit complex anything

Prone to overfitting; doesn't react to change in operation; not as simple to implement (hyperparameters); hard to explain; solution not optimal

Convolutional Networks

High dimensional data (images, etc.)

Prediction - categorical, probability, numeric prediction

Classification; supervised high dimensional

Learns important features / preprocessing; don't need to handcraft features

Prone to overfitting; doesn't react to change in operation; not as simple to implement (hyperparameters); hard to explain; solution not optimal

Recurrent Networks

Sequence data

Prediction - categorical, probability, numeric prediction

Time/space sensitive

No independence assumptions; memory of previous input

Prone to overfitting; doesn't react to change in operation; not as simple to implement (hyperparameters); hard to explain; solution not optimal; recurrence difficult to program

Bayesian Networks

Mostly categorical, but can be numeric

1) Model, 2) Probabilities for many variable combinations, 3) Conditional independencies

Many

Generative; easy to understand

Could be expensive based on number of variables

Genetic Algorithms

Data

Optimization / design solutions

Optimization / design

Generative; doesn't use gradient; novel solutions

Very expensive

Decision Trees

Labeled anything

Prediction / clasification, numeric

Simple

Easy to build, use, understand

Prone to under- and over-fitting

Page last updated: April 08, 2019