22 · APRIL · 2017

Análise de Múltipla Correspondência no R para o problema de Churn

FLAVIO CLESIO · 4 min

Via Data Science Plus

Analytical challenges in multivariate data analysis and predictive modeling include identifying redundant and irrelevant variables. A recommended analytics approach is to first address the redundancy; which can be achieved by identifying groups of variables that are as correlated as possible among themselves and as uncorrelated as possible with other variable groups in the same data set. On the other hand, relevancy is about potential predictor variables and involves understanding the relationship between the target variable and input variables. Multiple correspondence analysis (MCA) is a multivariate data analysis and data mining tool for finding and constructing a low-dimensional visual representation of variable associations among groups of categorical variables. Variable clustering as a tool for identifying redundancy is often applied to get a first impression of variable associations and multivariate data structure. The motivations of this post are to illustrate the applications of: 1) preparing input variables for analysis and predictive modeling, 2) MCA as a multivariate exploratory data analysis and categorical data mining tool for business insights of customer churn data, and 3) variable clustering of categorical variables for the identification of redundant variables.