RESOURCES

Case Study

Colleges Colleges need to get more students to graduate, and many schools are embracing a tool used by corporations and social media companies to track the clicks and movements of their customers: big data.

Under a Watchful Eye: How colleges are tracking students to boost graduation APM Reports, Educate series of podcasts
How One University Used Big Data to Boost Graduation Rates

Terms

Data mining is a cross-disciplinary field. The term "data mining" first came into widespread use in the mid to late 1990s. Machine learning was pioneered in the 1950s. Many consider the 1956 Dartmouth Summer Research Project as the seminal event for artificial intelligence. Machine learning used Bayesian statistical methods for probabilistic inference in the 1960s. It is useful to compare the fields of data mining, data science, statistics, machine learning and AI.

General

Skills Are Critical in Data Science Job Hunt, Aug. 22, 2019

Best job in America pays over $108,000 a yes-and has a high number of openings, Sept. 5, 2019

50 Best Jobs in American for 2019

Python

Python code to go from CSV-to-ARFF, https://github.com/christinequintana/CSV-to-ARFF/blob/master/CSVtoARFFconversion.py. Thank you Keith.

Keras: The Python Deep Learning library https://keras.io/

Data Sets

WEKA

Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classificiaton, regression, clustering, association rules and visualizaiton. It is free software developed at University of Waikato and licensed under the GNU General Public License.

A manual, Weka 3.8.1 manual is available on the Weka website at, http://www.cs.waikato.ac.nz/~ml/weka/documentation.html, which can be accessed via the "Help" option within Weka.

Weka packages can be gotten via the package manager that is under Tools in the Weka GUI Chooser. A useful package is "simpleEducationLearningSchemes".

Attribute Relation File Format (ARFF file is an ASCII text file that describes a list of instances sharing a set of attributes.

R

R Project Manual: An Introduction to R

Data Mining Competitions

General

When performing classification a number of error measurements are given.

Levels of Measurement http://www.usablestats.com/lessons/noir from Usable Stats.