Dekorationsartikel gehören nicht zum Leistungsumfang.
Sprache:
Englisch
33,80 €
Versandkostenfrei per Post / DHL
Lieferzeit 1-2 Wochen
Kategorien:
Beschreibung
The era of cloud-dependent AI is over. Today's developers can run state-of-the-art language models on their own hardware-from laptops to GPU clusters-without ever sending data to a third party. But the gap between downloading a model and deploying it efficiently is filled with questions about quantization, memory bandwidth, batching strategies, and tool selection. This book is your guide through that gap, showing you how to build scalable, cost-effective inference systems using the three pillars of open-source AI: Ollama, [...], and vLLM.
AI Inference with Ollama, [...], and vLLM takes you from running your first local model in minutes to optimizing production deployments serving thousands of requests per second. You'll learn when to use each tool, how to navigate the memory wall that bottlenecks LLM performance, and how to choose the right hardware and quantization strategy for your use case. Whether you're building RAG systems, deploying chatbots, or scaling inference across GPU clusters, this book gives you the practical knowledge to move from experimentation to production with confidence.
About the Author
GK Marballi has spent 20+ years turning data into competitive advantage for global brands from Priceline to S&P Global and Barnes & Noble. He has led high-impact product and analytics teams, and navigated the front lines of the AI revolution. He is based in New York City and holds an MBA from Harvard Business School.
AI Inference with Ollama, [...], and vLLM takes you from running your first local model in minutes to optimizing production deployments serving thousands of requests per second. You'll learn when to use each tool, how to navigate the memory wall that bottlenecks LLM performance, and how to choose the right hardware and quantization strategy for your use case. Whether you're building RAG systems, deploying chatbots, or scaling inference across GPU clusters, this book gives you the practical knowledge to move from experimentation to production with confidence.
About the Author
GK Marballi has spent 20+ years turning data into competitive advantage for global brands from Priceline to S&P Global and Barnes & Noble. He has led high-impact product and analytics teams, and navigated the front lines of the AI revolution. He is based in New York City and holds an MBA from Harvard Business School.
The era of cloud-dependent AI is over. Today's developers can run state-of-the-art language models on their own hardware-from laptops to GPU clusters-without ever sending data to a third party. But the gap between downloading a model and deploying it efficiently is filled with questions about quantization, memory bandwidth, batching strategies, and tool selection. This book is your guide through that gap, showing you how to build scalable, cost-effective inference systems using the three pillars of open-source AI: Ollama, [...], and vLLM.
AI Inference with Ollama, [...], and vLLM takes you from running your first local model in minutes to optimizing production deployments serving thousands of requests per second. You'll learn when to use each tool, how to navigate the memory wall that bottlenecks LLM performance, and how to choose the right hardware and quantization strategy for your use case. Whether you're building RAG systems, deploying chatbots, or scaling inference across GPU clusters, this book gives you the practical knowledge to move from experimentation to production with confidence.
About the Author
GK Marballi has spent 20+ years turning data into competitive advantage for global brands from Priceline to S&P Global and Barnes & Noble. He has led high-impact product and analytics teams, and navigated the front lines of the AI revolution. He is based in New York City and holds an MBA from Harvard Business School.
AI Inference with Ollama, [...], and vLLM takes you from running your first local model in minutes to optimizing production deployments serving thousands of requests per second. You'll learn when to use each tool, how to navigate the memory wall that bottlenecks LLM performance, and how to choose the right hardware and quantization strategy for your use case. Whether you're building RAG systems, deploying chatbots, or scaling inference across GPU clusters, this book gives you the practical knowledge to move from experimentation to production with confidence.
About the Author
GK Marballi has spent 20+ years turning data into competitive advantage for global brands from Priceline to S&P Global and Barnes & Noble. He has led high-impact product and analytics teams, and navigated the front lines of the AI revolution. He is based in New York City and holds an MBA from Harvard Business School.
Über den Autor
About the Author
GK Marballi has spent 20+ years turning data into competitive advantage for global brands from Priceline to S&P Global and Barnes & Noble. He has led high-impact product and analytics teams, and navigated the front lines of the AI revolution. He is based in New York City and holds an MBA from Harvard Business School.
GK Marballi has spent 20+ years turning data into competitive advantage for global brands from Priceline to S&P Global and Barnes & Noble. He has led high-impact product and analytics teams, and navigated the front lines of the AI revolution. He is based in New York City and holds an MBA from Harvard Business School.
Details
| Erscheinungsjahr: | 2026 |
|---|---|
| Fachbereich: | Datenkommunikation, Netze & Mailboxen |
| Genre: | Importe, Informatik |
| Rubrik: | Naturwissenschaften & Technik |
| Medium: | Taschenbuch |
| ISBN-13: | 9781105842733 |
| ISBN-10: | 1105842738 |
| Sprache: | Englisch |
| Einband: | Kartoniert / Broschiert |
| Autor: | Marballi, Gk |
| Hersteller: | Lulu.com |
| Verantwortliche Person für die EU: | Libri GmbH, Europaallee 1, D-36244 Bad Hersfeld, gpsr@libri.de |
| Maße: | 229 x 152 x 14 mm |
| Von/Mit: | Gk Marballi |
| Erscheinungsdatum: | 04.01.2026 |
| Gewicht: | 0,361 kg |