classification and prediction in data mining

Classification and Prediction:

comic and prediction are two forms of data analysis that can be used to extract models describing important dance classes or to predict future data trends camoopredios categorical (discrete unordered) labels, prediction models

example

What is classification in data mining?

Classification is a data mining function that assigns items in a collection to target form the in categories or classes. The goal of classification is to accurately predict from the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks.are the classification in data mining

we can build a classification model to categorize bank loan applications as a sine or music. or a prediction model to predict the expenditures potential customers om compuner equipment given their income and occupation.

• A predator is constructed that predicts a continuous-valued function, or ordered value, as opposed to a

categorical label Regression analysis is a statistical methodology that is most often used for numeric

• Muy classification and prediction methods have been proposed by researchers in machine leaming paper

recognition, and statistics.

• Mostalgons are memory resident, typically assuming a small data size. Recent data ming research has built on such work, developing scalable classification and prediction techniques capable of handling large disk-resident data.

Issues Regarding Classification and Prediction:

1.Preparing the Data for Classification and Prediction:

The following preprocessing steps may be applied to the data to help improve accuracy. efficiency and scalability of the classification or prediction process.

(i)Data cleaning:

This refers to the preprocessing of data in order to remove or reduce noise (by applying smoothing techniques) and the treatment of missing values (e.g., by replacing a missing value with the most commonly occurring value for that attribute, or with the most probable value based on statistics). Although most classification algorithms have some mechanisms for handling noisy or missing data, this step can help reduce confusion during learning.

(ii)Relevance analysis:

Many of the attributes in the data may be redundant.

• Correlation analysis can be used to identify whether any two given attributes are statistically related

. For example, a strong correlation between attributes Al and A2 would suggest that one of the two could be removed from further analysis. A database may also contain irrelevant attributes. Attribute subset selection can be used in these cases to find a reduced set of attributes such that the resulting probability distribution of the data classes is as close as possible to the original distribution obtained using all attributes. Hence, relevance analysis, in the form of correlation analysis and attribute subset selection can be used to detect attributes that do not contribute to the classification or prediction task. . Such analysis can help improve classification efficiency and scalability.

(iii)Data Transformation And Reduction:

. The data may be transformed by normalization, particularly when neural networks or methods involving distance measurements are used in the learning step.

• Normalization involves scaling all values for a given attribute so that they fall within a small specified

range, such as - 1 to +1 or 0 to 1. . The data can also be transformed by generalizing it to higher-level concepts. Concept hierarchies may be used for this purpose. This is particularly useful for continuous-valued attributes

For example, numeric values for the attributed income can be generalized to discrete ranges, such as low,

medium, and high. Similarly, categorical attributes, like street, can be generalized to higher-level concepts, like city.

• Data can also be reduced by applying many other methods, ranging from wavelet transformation and principle components analysis to discretization techniques, such as binning, histogram analysis, and clustering.

3.1.2 Comparing Classification and Prediction Methods:

→ Accuracy:

• The accuracy of a classifier refers to the ability of a given classifier to correctly predict the class label of new or previously unseen data (i.e., tuples without class label information).

The accuracy of a predictor refers to how well a given predictor can guess the value of the predicted

attribute for new or previously unseen data.

Speed:

This refers to the computational costs involved in generating and using the given classifier or predictor.

Robustness:

This is the ability of the classifier or predictor to make correct predictions given noisy data or data with missing values.

Scalability:

This refers to the ability to construct the classifier or predictor efficiently given large amounts of data.

Interpretability:

This refers to the level of understanding and insight that is provided by the classifier or predictor. Interpretability is subjective and therefore more difficult to assess.

Educational Blog

Education log

PageNavi Results No.

Ads

Thursday, December 19, 2019

classification and prediction in data mining