Zum Hauptinhalt springen Zur Suche springen Zur Hauptnavigation springen
Beschreibung
Jump-start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formats
Key Features:
- Build lakehouses with open table formats using compute engines such as Apache Spark, Flink, Trino, and Python
- Optimize lakehouses with techniques such as pruning, partitioning, compaction, indexing, and clustering
- Find out how to enable seamless integration, data management, and interoperability using Apache XTable
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description:
Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake.
You'll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You'll also get hands on with each table format with exercises using popular computing engines, such as Apache Spark, Flink, Trino, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you'll get to grips with the key components of lakehouse architecture and learn how to build, maintain, and optimize them.
By the end of this book, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization's data needs.
What You Will Learn:
- Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs
- Gain a complete understanding of data lifecycle management in lakehouses
- Learn how to systematically evaluate and choose the right lakehouse table format
- Optimize performance with sorting, clustering, and indexing techniques
- Use the open table format data with ML frameworks like TensorFlow and MLflow
- Interoperate across different table formats with Apache XTable and UniForm
- Secure your lakehouse with access controls and ensure regulatory compliance
Who this book is for:
This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.
Table of Contents
- Open Data Lakehouse: A New Architectural Paradigm
- Transactional Capabilities of the Lakehouse
- Apache Iceberg Deep Dive
- Apache Hudi Deep Dive
- Delta Lake Deep Dive
- Catalog and Metadata Management
- Interoperability in Lakehouses
- Performance Optimization and Tuning in a Lakehouse
- Data Governance and Security in Lakehouses
- Evaluating and Selecting Open Table Formats
- Real-World Applications and Learnings
Jump-start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formats
Key Features:
- Build lakehouses with open table formats using compute engines such as Apache Spark, Flink, Trino, and Python
- Optimize lakehouses with techniques such as pruning, partitioning, compaction, indexing, and clustering
- Find out how to enable seamless integration, data management, and interoperability using Apache XTable
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description:
Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake.
You'll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You'll also get hands on with each table format with exercises using popular computing engines, such as Apache Spark, Flink, Trino, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you'll get to grips with the key components of lakehouse architecture and learn how to build, maintain, and optimize them.
By the end of this book, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization's data needs.
What You Will Learn:
- Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs
- Gain a complete understanding of data lifecycle management in lakehouses
- Learn how to systematically evaluate and choose the right lakehouse table format
- Optimize performance with sorting, clustering, and indexing techniques
- Use the open table format data with ML frameworks like TensorFlow and MLflow
- Interoperate across different table formats with Apache XTable and UniForm
- Secure your lakehouse with access controls and ensure regulatory compliance
Who this book is for:
This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.
Table of Contents
- Open Data Lakehouse: A New Architectural Paradigm
- Transactional Capabilities of the Lakehouse
- Apache Iceberg Deep Dive
- Apache Hudi Deep Dive
- Delta Lake Deep Dive
- Catalog and Metadata Management
- Interoperability in Lakehouses
- Performance Optimization and Tuning in a Lakehouse
- Data Governance and Security in Lakehouses
- Evaluating and Selecting Open Table Formats
- Real-World Applications and Learnings
Über den Autor
Dipankar Mazumdar is currently the Director of Developer Advocacy at Cloudera, where he leads global developer initiatives focused on lakehouse architectures and generative AI. Previously, he held developer advocacy roles at Dremio, Onehouse, and Qlik, contributing to open source projects such as Apache Iceberg, Apache Hudi, and XTable, among others. For most of his career, Dipankar has worked at the intersection of data engineering and AI. He has also contributed to O'Reilly's Apache Iceberg: The Definitive Guide and has spoken at numerous conferences, including Databricks Data + AI, Netflix Engineering, ApacheCon, Scale By the Bay, and Data Day Texas, among others.
Details
Erscheinungsjahr: 2025
Genre: Importe, Informatik
Rubrik: Naturwissenschaften & Technik
Medium: Taschenbuch
ISBN-13: 9781836207238
ISBN-10: 1836207239
Sprache: Englisch
Einband: Kartoniert / Broschiert
Autor: Mazumdar, Dipankar
Govindarajan, Vinoth
Hersteller: Packt Publishing
Verantwortliche Person für die EU: Libri GmbH, Europaallee 1, D-36244 Bad Hersfeld, gpsr@libri.de
Maße: 235 x 191 x 22 mm
Von/Mit: Dipankar Mazumdar (u. a.)
Erscheinungsdatum: 26.12.2025
Gewicht: 0,773 kg
Artikel-ID: 134413338

Ähnliche Produkte