This hands-on practice is consist of Introduction, Installation, Preprocessing, Classification, Clustering, Visualization, Select attributions, and Association by using Weka tool.

- Alternative names

Knowledge Discovery in Databases(KDD), Knowledge extraction, Data/pattern analysis, Data archeology, Data dredging, Information harvesting, etc.

Real world's data contains noise and missing data. Those are can be removed with preprocessing techniques. It is essential process to apply data mining algorithms on target data. Frequently used preprocessing techniques are normalization, standardization, data cleaning, and so on.

- Feature Selection

Feature selection is also known as attribute selection and variable selection. It selects a subset of the most relevant features to construct models.

- Classification

Classification is supervised pattern learning technique using labeled training patterns. It constructs rules for classifying new data into the known groups. Well-known classifiers are k-NN, NaiveBayes, decision tree, and so on.

- Clustering

Clustering is unsupervised pattern learning technique. It used for that, finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. There are K-means clustering and Hierarchical clustering as clustering techniques.

- Regression

Regression is a statistical process for estimating the relationships among variables. It used to find a function that best fits(least error) the data point. There are two types of regressions, linear regression and non-linear regression.

- Association(Rule mining)

With given a set of transactions, it finds rules that will predic the occurrence of an item based on the occurrences of other items in the transaction. Apriori algorithm is one of famous algorithm for rule mining.

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU General Public License.

Download link : http://www.cs.waikato.ac.nz/ml/weka/

- R

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.

Download link : http://www.r-project.org/

- Etc.

There are many other tools for data mining, such as RapidMiner, KNIME, Rattle, and so on.