148,95 €*
Versandkostenfrei per Post / DHL
Lieferzeit 1-2 Wochen
Machine learning --also known as data mining or data analytics-- is a fundamental part of data science. It is used by organizations in a wide variety of arenas to turn raw data into actionable information.
Machine Learning for Business Analytics: Concepts, Techniques, and Applications in R provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation, and network analytics. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques.
This is the second R edition of Machine Learning for Business Analytics. This edition also includes:
* A new co-author, Peter Gedeck, who brings over 20 years of experience in machine learning using R
* An expanded chapter focused on discussion of deep learning techniques
* A new chapter on experimental feedback techniques including A/B testing, uplift modeling, and reinforcement learning
* A new chapter on responsible data science
* Updates and new material based on feedback from instructors teaching MBA, Masters in Business Analytics and related programs, undergraduate, diploma and executive courses, and from their students
* A full chapter devoted to relevant case studies with more than a dozen cases demonstrating applications for the machine learning techniques
* End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented
* A companion website with more than two dozen data sets, and instructor materials including exercise solutions, slides, and case solutions
This textbook is an ideal resource for upper-level undergraduate and graduate level courses in data science, predictive analytics, and business analytics. It is also an excellent reference for analysts, researchers, and data science practitioners working with quantitative data in management, finance, marketing, operations management, information systems, computer science, and information technology.
Machine learning --also known as data mining or data analytics-- is a fundamental part of data science. It is used by organizations in a wide variety of arenas to turn raw data into actionable information.
Machine Learning for Business Analytics: Concepts, Techniques, and Applications in R provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation, and network analytics. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques.
This is the second R edition of Machine Learning for Business Analytics. This edition also includes:
* A new co-author, Peter Gedeck, who brings over 20 years of experience in machine learning using R
* An expanded chapter focused on discussion of deep learning techniques
* A new chapter on experimental feedback techniques including A/B testing, uplift modeling, and reinforcement learning
* A new chapter on responsible data science
* Updates and new material based on feedback from instructors teaching MBA, Masters in Business Analytics and related programs, undergraduate, diploma and executive courses, and from their students
* A full chapter devoted to relevant case studies with more than a dozen cases demonstrating applications for the machine learning techniques
* End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented
* A companion website with more than two dozen data sets, and instructor materials including exercise solutions, slides, and case solutions
This textbook is an ideal resource for upper-level undergraduate and graduate level courses in data science, predictive analytics, and business analytics. It is also an excellent reference for analysts, researchers, and data science practitioners working with quantitative data in management, finance, marketing, operations management, information systems, computer science, and information technology.
Galit Shmueli, PhD, is Distinguished Professor and Institute Director at National Tsing Hua University's Institute of Service Science. She has designed and instructed business analytics courses since 2004 at University of Maryland, [...], The Indian School of Business, and National Tsing Hua University, Taiwan.
Peter C. Bruce, is Founder of the Institute for Statistics Education at [...], and Chief Learning Officer at Elder Research, Inc.
Peter Gedeck, PhD, is Senior Data Scientist at Collaborative Drug Discovery and teaches at [...] and the UVA School of Data Science. His specialty is the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates.
Inbal Yahav, PhD, is a Senior Lecturer in The Coller School of Management at Tel Aviv University, Israel. Her work focuses on the development and adaptation of statistical models for use by researchers in the field of information systems.
Nitin R. Patel, PhD, is Co-founder and Lead Researcher at Cytel Inc. He was also a Co-founder of Tata Consultancy Services. A Fellow of the American Statistical Association, Dr. Patel has served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University, USA.
Foreword by Ravi Bapna xix
Foreword by Gareth James xxi
Preface to the Second R Edition xxiii
Acknowledgments xxvi
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 11
Order of Topics 13
Chapter 2 Overview of the Machine Learning Process 17
2.1 Introduction 17
2.2 Core Ideas in Machine Learning 18
Classification 18
Prediction 18
Association Rules and Recommendation Systems 18
Predictive Analytics 19
Data Reduction and Dimension Reduction 19
Data Exploration and Visualization 19
Supervised and Unsupervised Learning 20
2.3 The Steps in a Machine Learning Project 21
2.4 Preliminary Steps 23
Organization of Data 23
Predicting Home Values in the West Roxbury Neighborhood 23
Loading and Looking at the Data in R 24
Sampling from a Database 26
Oversampling Rare Events in Classification Tasks 27
Preprocessing and Cleaning the Data 28
2.5 Predictive Power and Overfitting 35
Overfitting 36
Creating and Using Data Partitions 38
2.6 Building a Predictive Model 41
Modeling Process 41
2.7 Using R for Machine Learning on a Local Machine 46
2.8 Automating Machine Learning Solutions 47
Predicting Power Generator Failure 48
Uber's Michelangelo 50
2.9 Ethical Practice in Machine Learning 52
Machine Learning Software: The State of the Market (by Herb Edelstein) 53
Problems 57
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 63
3.1 Uses of Data Visualization 63
Base R or ggplot? 65
3.2 Data Examples 65
Example 1: Boston Housing Data 65
Example 2: Ridership on Amtrak Trains 67
3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 67
Distribution Plots: Boxplots and Histograms 70
Heatmaps: Visualizing Correlations and Missing Values 73
3.4 Multidimensional Visualization 75
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 76
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 79
Reference: Trend Lines and Labels 83
Scaling Up to Large Datasets 85
Multivariate Plot: Parallel Coordinates Plot 85
Interactive Visualization 88
3.5 Specialized Visualizations 91
Visualizing Networked Data 91
Visualizing Hierarchical Data: Treemaps 93
Visualizing Geographical Data: Map Charts 95
3.6 Major Visualizations and Operations, by Machine Learning Goal 97
Prediction 97
Classification 97
Time Series Forecasting 97
Unsupervised Learning 98
Problems 99
Chapter 4 Dimension Reduction 101
4.1 Introduction 101
4.2 Curse of Dimensionality 102
4.3 Practical Considerations 102
Example 1: House Prices in Boston 103
4.4 Data Summaries 103
Summary Statistics 104
Aggregation and Pivot Tables 104
4.5 Correlation Analysis 107
4.6 Reducing the Number of Categories in Categorical Variables 109
4.7 Converting a Categorical Variable to a Numerical Variable 111
4.8 Principal Component Analysis 111
Example 2: Breakfast Cereals 111
Principal Components 116
Normalizing the Data 117
Using Principal Components for Classification and Prediction 120
4.9 Dimension Reduction Using Regression Models 121
4.10 Dimension Reduction Using Classification and Regression Trees 121
Problems 123
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 129
5.1 Introduction 130
5.2 Evaluating Predictive Performance 130
Naive Benchmark: The Average 131
Prediction Accuracy Measures 131
Comparing Training and Holdout Performance 133
Cumulative Gains and Lift Charts 133
5.3 Judging Classifier Performance 136
Benchmark: The Naive Rule 136
Class Separation 136
The Confusion (Classification) Matrix 137
Using the Holdout Data 138
Accuracy Measures 139
Propensities and Threshold for Classification 139
Performance in Case of Unequal Importance of Classes 143
Asymmetric Misclassification Costs 146
Generalization to More Than Two Classes 149
5.4 Judging Ranking Performance 150
Cumulative Gains and Lift Charts for Binary Data 150
Decile-wise Lift Charts 153
Beyond Two Classes 154
Gains and Lift Charts Incorporating Costs and Benefits 154
Cumulative Gains as a Function of Threshold 155
5.5 Oversampling 156
Creating an Over-sampled Training Set 158
Evaluating Model Performance Using a Non-oversampled Holdout Set 159
Evaluating Model Performance If Only Oversampled Holdout Set Exists 159
Problems 162
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 167
6.1 Introduction 167
6.2 Explanatory vs. Predictive Modeling 168
6.3 Estimating the Regression Equation and Prediction 170
Example: Predicting the Price of Used Toyota Corolla Cars 171
Cross-validation and caret 175
6.4 Variable Selection in Linear Regression 176
Reducing the Number of Predictors 176
How to Reduce the Number of Predictors 178
Regularization (Shrinkage Models) 183
Problems 188
Chapter 7 k-Nearest Neighbors (kNN) 193
7.1 The k-NN Classifier (Categorical Outcome) 193
Determining Neighbors 194
Classification Rule 194
Example: Riding Mowers 195
Choosing k 196
Weighted k-NN 199
Setting the Cutoff Value 200
k-NN with More Than Two Classes 201
Converting Categorical Variables to Binary Dummies 201
7.2 k-NN for a Numerical Outcome 201
7.3 Advantages and Shortcomings of k-NN Algorithms 204
Problems 205
Chapter 8 The Naive Bayes Classifier 207
8.1 Introduction 207
Threshold Probability Method 208
Conditional Probability 208
Example 1: Predicting Fraudulent Financial Reporting 208
8.2 Applying the Full (Exact) Bayesian Classifier 209
Using the "Assign to the Most Probable Class" Method 210
Using the Threshold Probability Method 210
Practical Difficulty with the Complete (Exact) Bayes Procedure 210
8.3 Solution: Naive Bayes 211
The Naive Bayes Assumption of Conditional Independence 212
Using the Threshold Probability Method 212
Example 2: Predicting Fraudulent Financial Reports, Two Predictors 213
Example 3: Predicting Delayed Flights 214
Working with Continuous Predictors 218
8.4 Advantages and Shortcomings of the Naive Bayes Classifier 220
Problems 223
Chapter 9 Classification and Regression Trees 225
9.1 Introduction 226
Tree Structure 227
Decision Rules 227
Classifying a New Record 227
9.2 Classification Trees 228
Recursive Partitioning 228
Example 1: Riding Mowers 228
Measures of Impurity 231
9.3 Evaluating the Performance of a Classification Tree 235
Example 2: Acceptance of Personal Loan 236
9.4 Avoiding Overfitting 239
Stopping Tree Growth 242
Pruning the Tree 243
Best-Pruned Tree 245
9.5 Classification Rules from Trees 247
9.6 Classification Trees for More Than Two Classes 248
9.7 Regression Trees 249
Prediction 250
Measuring Impurity 250
Evaluating Performance 250
9.8 Advantages and Weaknesses of a Tree 250
9.9 Improving Prediction: Random Forests and Boosted Trees 252
Random Forests 252
Boosted Trees 254
Problems 257
Chapter 10 Logistic Regression 261
10.1 Introduction 261
10.2 The Logistic Regression Model 263
10.3 Example: Acceptance of Personal Loan 264
Model with a Single Predictor 265
Estimating the Logistic Model from Data: Computing Parameter Estimates 267
Interpreting Results in Terms of Odds (for a Profiling Goal) 270
10.4 Evaluating Classification Performance 271
10.5 Variable Selection 273
10.6 Logistic Regression for Multi-Class Classification 274
Ordinal Classes 275
Nominal Classes 276
10.7 Example of Complete Analysis: Predicting Delayed Flights 277
Data Preprocessing 282
Model-Fitting and Estimation 282
Model Interpretation 282
Model Performance 284
Variable Selection 285
Problems 289
Chapter 11 Neural Nets 293
11.1 Introduction 293
11.2 Concept and Structure of a Neural Network 294
11.3 Fitting a Network to Data 295
Example 1: Tiny Dataset 295
Computing Output of Nodes 296
Preprocessing the Data 299
Training the Model 300
Example 2: Classifying Accident Severity 304
Avoiding Overfitting 305
Using the Output for Prediction and Classification 305
11.4 Required User Input 307
11.5 Exploring the Relationship Between Predictors and Outcome 308
11.6 Deep Learning 309
Convolutional Neural Networks (CNNs) 310
Local Feature Map 311
A Hierarchy of Features 311
The Learning Process 312
Unsupervised Learning 312
Example: Classification of Fashion Images 313
Conclusion 320
11.7...
Erscheinungsjahr: | 2023 |
---|---|
Genre: | Mathematik |
Rubrik: | Naturwissenschaften & Technik |
Medium: | Buch |
Inhalt: | 688 S. |
ISBN-13: | 9781119835172 |
ISBN-10: | 1119835178 |
Sprache: | Englisch |
Einband: | Gebunden |
Autor: |
Shmueli, Galit
Yahav, Inbal Patel, Nitin R. Gedeck, Peter Bruce, Peter C. |
Hersteller: | John Wiley & Sons Inc |
Maße: | 261 x 185 x 43 mm |
Von/Mit: | Galit Shmueli (u. a.) |
Erscheinungsdatum: | 08.02.2023 |
Gewicht: | 1,616 kg |
Galit Shmueli, PhD, is Distinguished Professor and Institute Director at National Tsing Hua University's Institute of Service Science. She has designed and instructed business analytics courses since 2004 at University of Maryland, [...], The Indian School of Business, and National Tsing Hua University, Taiwan.
Peter C. Bruce, is Founder of the Institute for Statistics Education at [...], and Chief Learning Officer at Elder Research, Inc.
Peter Gedeck, PhD, is Senior Data Scientist at Collaborative Drug Discovery and teaches at [...] and the UVA School of Data Science. His specialty is the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates.
Inbal Yahav, PhD, is a Senior Lecturer in The Coller School of Management at Tel Aviv University, Israel. Her work focuses on the development and adaptation of statistical models for use by researchers in the field of information systems.
Nitin R. Patel, PhD, is Co-founder and Lead Researcher at Cytel Inc. He was also a Co-founder of Tata Consultancy Services. A Fellow of the American Statistical Association, Dr. Patel has served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University, USA.
Foreword by Ravi Bapna xix
Foreword by Gareth James xxi
Preface to the Second R Edition xxiii
Acknowledgments xxvi
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 11
Order of Topics 13
Chapter 2 Overview of the Machine Learning Process 17
2.1 Introduction 17
2.2 Core Ideas in Machine Learning 18
Classification 18
Prediction 18
Association Rules and Recommendation Systems 18
Predictive Analytics 19
Data Reduction and Dimension Reduction 19
Data Exploration and Visualization 19
Supervised and Unsupervised Learning 20
2.3 The Steps in a Machine Learning Project 21
2.4 Preliminary Steps 23
Organization of Data 23
Predicting Home Values in the West Roxbury Neighborhood 23
Loading and Looking at the Data in R 24
Sampling from a Database 26
Oversampling Rare Events in Classification Tasks 27
Preprocessing and Cleaning the Data 28
2.5 Predictive Power and Overfitting 35
Overfitting 36
Creating and Using Data Partitions 38
2.6 Building a Predictive Model 41
Modeling Process 41
2.7 Using R for Machine Learning on a Local Machine 46
2.8 Automating Machine Learning Solutions 47
Predicting Power Generator Failure 48
Uber's Michelangelo 50
2.9 Ethical Practice in Machine Learning 52
Machine Learning Software: The State of the Market (by Herb Edelstein) 53
Problems 57
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 63
3.1 Uses of Data Visualization 63
Base R or ggplot? 65
3.2 Data Examples 65
Example 1: Boston Housing Data 65
Example 2: Ridership on Amtrak Trains 67
3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 67
Distribution Plots: Boxplots and Histograms 70
Heatmaps: Visualizing Correlations and Missing Values 73
3.4 Multidimensional Visualization 75
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 76
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 79
Reference: Trend Lines and Labels 83
Scaling Up to Large Datasets 85
Multivariate Plot: Parallel Coordinates Plot 85
Interactive Visualization 88
3.5 Specialized Visualizations 91
Visualizing Networked Data 91
Visualizing Hierarchical Data: Treemaps 93
Visualizing Geographical Data: Map Charts 95
3.6 Major Visualizations and Operations, by Machine Learning Goal 97
Prediction 97
Classification 97
Time Series Forecasting 97
Unsupervised Learning 98
Problems 99
Chapter 4 Dimension Reduction 101
4.1 Introduction 101
4.2 Curse of Dimensionality 102
4.3 Practical Considerations 102
Example 1: House Prices in Boston 103
4.4 Data Summaries 103
Summary Statistics 104
Aggregation and Pivot Tables 104
4.5 Correlation Analysis 107
4.6 Reducing the Number of Categories in Categorical Variables 109
4.7 Converting a Categorical Variable to a Numerical Variable 111
4.8 Principal Component Analysis 111
Example 2: Breakfast Cereals 111
Principal Components 116
Normalizing the Data 117
Using Principal Components for Classification and Prediction 120
4.9 Dimension Reduction Using Regression Models 121
4.10 Dimension Reduction Using Classification and Regression Trees 121
Problems 123
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 129
5.1 Introduction 130
5.2 Evaluating Predictive Performance 130
Naive Benchmark: The Average 131
Prediction Accuracy Measures 131
Comparing Training and Holdout Performance 133
Cumulative Gains and Lift Charts 133
5.3 Judging Classifier Performance 136
Benchmark: The Naive Rule 136
Class Separation 136
The Confusion (Classification) Matrix 137
Using the Holdout Data 138
Accuracy Measures 139
Propensities and Threshold for Classification 139
Performance in Case of Unequal Importance of Classes 143
Asymmetric Misclassification Costs 146
Generalization to More Than Two Classes 149
5.4 Judging Ranking Performance 150
Cumulative Gains and Lift Charts for Binary Data 150
Decile-wise Lift Charts 153
Beyond Two Classes 154
Gains and Lift Charts Incorporating Costs and Benefits 154
Cumulative Gains as a Function of Threshold 155
5.5 Oversampling 156
Creating an Over-sampled Training Set 158
Evaluating Model Performance Using a Non-oversampled Holdout Set 159
Evaluating Model Performance If Only Oversampled Holdout Set Exists 159
Problems 162
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 167
6.1 Introduction 167
6.2 Explanatory vs. Predictive Modeling 168
6.3 Estimating the Regression Equation and Prediction 170
Example: Predicting the Price of Used Toyota Corolla Cars 171
Cross-validation and caret 175
6.4 Variable Selection in Linear Regression 176
Reducing the Number of Predictors 176
How to Reduce the Number of Predictors 178
Regularization (Shrinkage Models) 183
Problems 188
Chapter 7 k-Nearest Neighbors (kNN) 193
7.1 The k-NN Classifier (Categorical Outcome) 193
Determining Neighbors 194
Classification Rule 194
Example: Riding Mowers 195
Choosing k 196
Weighted k-NN 199
Setting the Cutoff Value 200
k-NN with More Than Two Classes 201
Converting Categorical Variables to Binary Dummies 201
7.2 k-NN for a Numerical Outcome 201
7.3 Advantages and Shortcomings of k-NN Algorithms 204
Problems 205
Chapter 8 The Naive Bayes Classifier 207
8.1 Introduction 207
Threshold Probability Method 208
Conditional Probability 208
Example 1: Predicting Fraudulent Financial Reporting 208
8.2 Applying the Full (Exact) Bayesian Classifier 209
Using the "Assign to the Most Probable Class" Method 210
Using the Threshold Probability Method 210
Practical Difficulty with the Complete (Exact) Bayes Procedure 210
8.3 Solution: Naive Bayes 211
The Naive Bayes Assumption of Conditional Independence 212
Using the Threshold Probability Method 212
Example 2: Predicting Fraudulent Financial Reports, Two Predictors 213
Example 3: Predicting Delayed Flights 214
Working with Continuous Predictors 218
8.4 Advantages and Shortcomings of the Naive Bayes Classifier 220
Problems 223
Chapter 9 Classification and Regression Trees 225
9.1 Introduction 226
Tree Structure 227
Decision Rules 227
Classifying a New Record 227
9.2 Classification Trees 228
Recursive Partitioning 228
Example 1: Riding Mowers 228
Measures of Impurity 231
9.3 Evaluating the Performance of a Classification Tree 235
Example 2: Acceptance of Personal Loan 236
9.4 Avoiding Overfitting 239
Stopping Tree Growth 242
Pruning the Tree 243
Best-Pruned Tree 245
9.5 Classification Rules from Trees 247
9.6 Classification Trees for More Than Two Classes 248
9.7 Regression Trees 249
Prediction 250
Measuring Impurity 250
Evaluating Performance 250
9.8 Advantages and Weaknesses of a Tree 250
9.9 Improving Prediction: Random Forests and Boosted Trees 252
Random Forests 252
Boosted Trees 254
Problems 257
Chapter 10 Logistic Regression 261
10.1 Introduction 261
10.2 The Logistic Regression Model 263
10.3 Example: Acceptance of Personal Loan 264
Model with a Single Predictor 265
Estimating the Logistic Model from Data: Computing Parameter Estimates 267
Interpreting Results in Terms of Odds (for a Profiling Goal) 270
10.4 Evaluating Classification Performance 271
10.5 Variable Selection 273
10.6 Logistic Regression for Multi-Class Classification 274
Ordinal Classes 275
Nominal Classes 276
10.7 Example of Complete Analysis: Predicting Delayed Flights 277
Data Preprocessing 282
Model-Fitting and Estimation 282
Model Interpretation 282
Model Performance 284
Variable Selection 285
Problems 289
Chapter 11 Neural Nets 293
11.1 Introduction 293
11.2 Concept and Structure of a Neural Network 294
11.3 Fitting a Network to Data 295
Example 1: Tiny Dataset 295
Computing Output of Nodes 296
Preprocessing the Data 299
Training the Model 300
Example 2: Classifying Accident Severity 304
Avoiding Overfitting 305
Using the Output for Prediction and Classification 305
11.4 Required User Input 307
11.5 Exploring the Relationship Between Predictors and Outcome 308
11.6 Deep Learning 309
Convolutional Neural Networks (CNNs) 310
Local Feature Map 311
A Hierarchy of Features 311
The Learning Process 312
Unsupervised Learning 312
Example: Classification of Fashion Images 313
Conclusion 320
11.7...
Erscheinungsjahr: | 2023 |
---|---|
Genre: | Mathematik |
Rubrik: | Naturwissenschaften & Technik |
Medium: | Buch |
Inhalt: | 688 S. |
ISBN-13: | 9781119835172 |
ISBN-10: | 1119835178 |
Sprache: | Englisch |
Einband: | Gebunden |
Autor: |
Shmueli, Galit
Yahav, Inbal Patel, Nitin R. Gedeck, Peter Bruce, Peter C. |
Hersteller: | John Wiley & Sons Inc |
Maße: | 261 x 185 x 43 mm |
Von/Mit: | Galit Shmueli (u. a.) |
Erscheinungsdatum: | 08.02.2023 |
Gewicht: | 1,616 kg |