Zum Hauptinhalt springen
Dekorationsartikel gehören nicht zum Leistungsumfang.
Cleaning Data for Effective Data Science
Doing the other 80% of the work with Python, R, and command-line tools
Taschenbuch von David Mertz
Sprache: Englisch

56,25 €*

inkl. MwSt.

Versandkostenfrei per Post / DHL

Lieferzeit 1-2 Wochen

Kategorien:
Beschreibung
A comprehensive guide for data scientists to master effective data cleaning tools and techniques

Key Features:Think about your data intelligently and ask the right questions
Master data cleaning techniques using hands-on examples belonging to diverse domains
Work with detailed, commented, well-tested code samples in Python and R

Book Description:
In data science, data analysis, or machine learning, most of the effort needed to achieve your actual purpose lies in cleaning your data. Using Python, R, and command-line tools, you will learn the essential cleaning steps performed in every production data science or data analysis pipeline. This book not only teaches you data preparation but also what questions you should ask of your data.

The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers¿long-form exercises at the end of each chapter to practice the skills acquired.

You will begin by looking at data ingestion of a range of data formats. Moving on, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.

By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.

What You Will Learn:Ingest and work with common tabular, hierarchical, and other data formats
Apply useful rules and heuristics for assessing data quality and detecting bias
Identify and handle unreliable data and outliers in their many forms
Impute sensible values into missing data and use sampling to fix imbalances
Generate synthetic features that help to draw out patterns in your data
Prepare data competently and correctly for analytic and machine learning tasks

Who this book is for:
This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful. The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues.
A comprehensive guide for data scientists to master effective data cleaning tools and techniques

Key Features:Think about your data intelligently and ask the right questions
Master data cleaning techniques using hands-on examples belonging to diverse domains
Work with detailed, commented, well-tested code samples in Python and R

Book Description:
In data science, data analysis, or machine learning, most of the effort needed to achieve your actual purpose lies in cleaning your data. Using Python, R, and command-line tools, you will learn the essential cleaning steps performed in every production data science or data analysis pipeline. This book not only teaches you data preparation but also what questions you should ask of your data.

The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers¿long-form exercises at the end of each chapter to practice the skills acquired.

You will begin by looking at data ingestion of a range of data formats. Moving on, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.

By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.

What You Will Learn:Ingest and work with common tabular, hierarchical, and other data formats
Apply useful rules and heuristics for assessing data quality and detecting bias
Identify and handle unreliable data and outliers in their many forms
Impute sensible values into missing data and use sampling to fix imbalances
Generate synthetic features that help to draw out patterns in your data
Prepare data competently and correctly for analytic and machine learning tasks

Who this book is for:
This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful. The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues.
Über den Autor
David Mertz is the founder of KDM Training, a partnership dedicated to educating developers and data scientists in machine learning and scientific computing. Previously, he created the data science training program for Anaconda Inc.

With the advent of deep neural networks he has turned to training our robot overlords as well.

He was honored to work for 8 years with D. E. Shaw Research, who have built the world's fastest, highly-specialized, supercomputer for performing molecular dynamics.

David was a Director of the PSF for six years, and remains co-chair of its Trademarks Committee and of its Scientific Python Working Group. His columns, Charming Python and XML Matters, written in the 2000s, were the most widely read articles in the Python world.

He has written previous books for Packt, O'Reilly and Addison-Wesley, and has given keynote addresses at numerous international programming conferences.

Long ago, he earned a doctorate in post-structuralist political philosophy. Fate is a cruel mistress.
Details
Erscheinungsjahr: 2021
Genre: Informatik
Rubrik: Naturwissenschaften & Technik
Medium: Taschenbuch
ISBN-13: 9781801071291
ISBN-10: 1801071292
Sprache: Englisch
Ausstattung / Beilage: Paperback
Einband: Kartoniert / Broschiert
Autor: Mertz, David
Hersteller: Packt Publishing
Maße: 235 x 191 x 27 mm
Von/Mit: David Mertz
Erscheinungsdatum: 31.03.2021
Gewicht: 0,92 kg
Artikel-ID: 120137916
Über den Autor
David Mertz is the founder of KDM Training, a partnership dedicated to educating developers and data scientists in machine learning and scientific computing. Previously, he created the data science training program for Anaconda Inc.

With the advent of deep neural networks he has turned to training our robot overlords as well.

He was honored to work for 8 years with D. E. Shaw Research, who have built the world's fastest, highly-specialized, supercomputer for performing molecular dynamics.

David was a Director of the PSF for six years, and remains co-chair of its Trademarks Committee and of its Scientific Python Working Group. His columns, Charming Python and XML Matters, written in the 2000s, were the most widely read articles in the Python world.

He has written previous books for Packt, O'Reilly and Addison-Wesley, and has given keynote addresses at numerous international programming conferences.

Long ago, he earned a doctorate in post-structuralist political philosophy. Fate is a cruel mistress.
Details
Erscheinungsjahr: 2021
Genre: Informatik
Rubrik: Naturwissenschaften & Technik
Medium: Taschenbuch
ISBN-13: 9781801071291
ISBN-10: 1801071292
Sprache: Englisch
Ausstattung / Beilage: Paperback
Einband: Kartoniert / Broschiert
Autor: Mertz, David
Hersteller: Packt Publishing
Maße: 235 x 191 x 27 mm
Von/Mit: David Mertz
Erscheinungsdatum: 31.03.2021
Gewicht: 0,92 kg
Artikel-ID: 120137916
Warnhinweis

Ähnliche Produkte

Ähnliche Produkte