Discovering Knowledge in Data: An Introduction to Data by Daniel T. Larose, Chantel D. Larose

The second one version of a hugely praised, winning reference on information mining, with thorough insurance of massive facts functions, predictive analytics, and statistical analysis.

Includes new chapters on:
• Multivariate Statistics
• getting ready to version the information, and
• Imputation of lacking information, and
• an Appendix on info Summarization and Visualization

• bargains broad assurance of the R statistical programming language
• includes 280 end-of-chapter exercises
• contains a spouse site with extra assets for all readers, and
• Powerpoint slides, a ideas guide, and steered tasks for teachers who undertake the booklet

Clearly, these measures of center do not provide us with a complete picture. What is missing are measures of spread or measures of variability, which will describe how spread out the data values are. Portfolio A’s P/E ratios are more spread out than those of portfolio B, so the measures of variability for portfolio A should be larger than those of B. Typical measures of variability include the range (maximum − minimum), the standard deviation, the mean absolute deviation, and the interquartile range.

3 Prediction Prediction is similar to classification and estimation, except that for prediction, the results lie in the future. Examples of prediction tasks in business and research include r Predicting the price of a stock 3 months into the future. r Predicting the percentage increase in traffic deaths next year if the speed limit is increased. r Predicting the winner of this fall’s World Series, based on a comparison of the team statistics. r Predicting whether a particular molecule in drug discovery will lead to a profitable new drug for a pharmaceutical company.

5(IQR) or more above Q3. For example, suppose for a set of test scores, the 25th percentile was Q1 = 70 and the 75th percentile was Q3 = 80, so that half of all the test scores fell between 70 and 80. Then the interquartile range, or the difference between these quartiles was IQR = 80 − 70 = 10. A test score would be robustly identified as an outlier if a. 5(10) = 55 or b. 5(10) = 95. 13 FLAG VARIABLES Some analytical methods, such as regression, require predictors to be numeric. Thus, analysts wishing to use categorical predictors in regression need to recode the categorical variable into one or more flag variables.

