E-Book, Englisch, 148 Seiten
Akhtar / Magham Pro Apache Phoenix
1. ed
ISBN: 978-1-4842-2370-3
Verlag: Apress
Format: PDF
Kopierschutz: 1 - PDF Watermark
An SQL Driver for HBase
E-Book, Englisch, 148 Seiten
ISBN: 978-1-4842-2370-3
Verlag: Apress
Format: PDF
Kopierschutz: 1 - PDF Watermark
Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space. This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds.
Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop.You will learn how to:
Handle a petabyte data store by applying familiar SQL techniques
Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase
Apply best practices while working with a scalable data store on Hadoop and HBase
Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis
Demonstrate real-time use cases and big data modeling techniques
Who This Book Is For
Data engineers, Big Data administrators, and architects.
Shakil Akhtar is TOGAF 9 Certified Enterprise Architect passionate about Digital Transformation, Cloud Computing, Big Data and Internet of Things technologies. He holds many certifications including Oracle Certified Master Java Enterprise Architect (OCMJEA). He worked with Cisco, Oracle, CA Technologies and various other organizations. Where he developed and architected large-scale complex enterprise software, creating frameworks and scaling systems to petabyte datasets. He is an enthusiastic open source user and longtime fan. When not working, he can be found playing guitar and doing some jamming sessions with his friends.
Ravi Mugham, an engineer passionate about data and data-driven engineering, experienced with working and scaling solutions to petabyte datasets. In his past experience, he has worked with CA Technologies, Bazaarvoice and various other startups. Actively involved in open source projects and is a PMC member to Apache Phoenix. Currently, his interests are in Distributed Data stream processing
Autoren/Hrsg.
Weitere Infos & Material
1;Contents at a Glance;4
2;Contents;5
3;About the Authors;13
4;About the Technical Reviewers;14
5;Chapter 1: Introduction;15
5.1;1.1 Big Data Lake and Its Representation;16
5.2;1.2 Modern Applications and Big Data;17
5.2.1;1.2.1 Fraud Detection in Banking;17
5.2.2;1.2.2 Log Data Analysis;17
5.2.3;1.2.3 Recommendation Engines;18
5.2.3.1;1.2.3.1 Social Media Analysis;18
5.3;1.3 Analyzing Big Data;18
5.4;1.4 An Overview of Hadoop and MapReduce;19
5.5;1.5 Hadoop Ecosystem;19
5.5.1;1.5.1 HDFS;20
5.5.2;1.5.2 MapReduce;21
5.5.3;1.5.3 HBase;23
5.5.4;1.5.4 Hive;24
5.5.5;1.5.5 YARN;25
5.5.6;1.5.6 Spark;25
5.5.7;1.5.7 PIG;25
5.5.8;1.5.8 ZooKeeper;25
5.6;1.6 Phoenix in the Hadoop Ecosystem;26
5.7;1.7 Phoenix’s Place in Big Data Systems;26
5.8;1.8 Importance of Traditional SQL-Based Tools and the Role of Phoenix;26
5.8.1;1.8.1 Traditional DBA Problems for Big Data Systems-;27
5.8.2;1.8.2 Which Tool Should I Use for Big Data?;27
5.8.3;1.8.3 Massive Data Storage and Challenges;27
5.8.4;1.8.4 A Traditional Data Warehouse and Querying;27
5.9;1.9 Apache Phoenix in Big Data Analytics;28
5.10;1.10 Summary;28
6;Chapter 2: Using Phoenix;29
6.1;2.1 What is Apache Phoenix?;29
6.2;2.2 Architecture;30
6.2.1;2.2.1 Installing Apache Phoenix;31
6.2.2;2.2.2 Installing Java;31
6.2.2.1;2.2.2.1 Installing Java on Linux;31
6.2.2.2;2.2.2.2 Installing Java on Mac OS X;32
6.3;2.3 Installing HBase;32
6.4;2.4 Installing Apache Phoenix;33
6.5;2.5 Installing Phoenix on Hortonworks HDP;34
6.5.1;2.5.1 Downloading Hortonworks Sandbox;35
6.5.2;2.5.2 Start HBase;41
6.5.3;2.5.3 Testing Your Phoenix Installation;42
6.6;2.6 Installing Phoenix on Cloudera Hadoop;44
6.7;2.7 Capabilities;45
6.8;2.8 Hadoop Ecosystem and the Role of Phoenix;46
6.9;2.9 Brief Description of Phoenix’s Key Features;47
6.9.1;2.9.1 Transactions;47
6.9.2;2.9.2 User-Defined Functions;47
6.9.3;2.9.3 Secondary Indexes;48
6.9.4;2.9.4 Skip Scan;48
6.9.5;2.9.5 Views;48
6.9.6;2.9.6 Multi-Tenancy;48
6.9.7;2.9.7 Query Server;49
6.10;2.10 Summary;49
7;Chapter 3: CRUD with Phoenix;50
7.1;3.1 Data Types in Phoenix;50
7.1.1;3.1.1 Primitive Data Types;50
7.1.2;3.1.2 Complex Data Types;50
7.2;3.2 Data Model;51
7.2.1;3.2.1 Steps in data modeling;52
7.3;3.3 Phoenix Write Path;52
7.4;3.4 Phoenix Read Path;52
7.5;3.5 Basic Commands;52
7.5.1;3.5.1 HELP;53
7.5.2;3.5.2 CREATE;54
7.5.3;3.5.3 UPSERT;54
7.5.4;3.5.4 SELECT;54
7.5.5;3.5.5 ALTER;55
7.5.6;3.5.6 DELETE;55
7.5.7;3.5.7 DESCRIBE;55
7.5.8;3.5.8 LIST;56
7.6;3.6 Working with Phoenix API;56
7.6.1;3.6.1 Environment setup;56
7.7;3.7 Summary;62
8;Chapter 4: Querying Data;63
8.1;4.1 Constraints;63
8.1.1;4.1.1 NOT NULL;63
8.2;4.2 Creating Tables;64
8.3;4.3 Salted Tables;65
8.4;4.4 Dropping Tables;67
8.5;4.5 ALTER Tables;67
8.5.1;4.5.1 Adding Columns;68
8.5.2;4.5.2 Deleting or Replacing Columns;68
8.5.3;4.5.3 Renaming a Column;69
8.6;4.6 Clauses;69
8.6.1;4.6.1 LIMIT;69
8.6.2;4.6.2 WHERE;70
8.6.3;4.6.3 GROUP BY;70
8.6.4;4.6.4 HAVING;71
8.6.5;4.6.5 ORDER BY;71
8.7;4.7 Logical Operators;72
8.7.1;4.7.1 AND;72
8.7.2;4.7.2 OR;72
8.7.3;4.7.3 IN;72
8.7.4;4.7.4 LIKE;73
8.7.5;4.7.5 BETWEEN;73
8.8;4.8 Summary;73
9;Chapter 5: Advanced Querying;74
9.1;5.1 Joins;74
9.2;5.2 Inner Join;74
9.3;5.3 Outer Join;75
9.3.1;5.3.1 Left Outer Join;75
9.3.2;5.3.2 Right Outer Join;76
9.3.3;5.3.3 Full Outer Join;77
9.4;5.4 Grouped Joins;78
9.5;5.5 Hash Join;79
9.6;5.6 Sort Merge Join;80
9.7;5.7 Join Query Optimizations;80
9.7.1;5.7.1 Optimizing Through Configuration Properties;81
9.7.2;5.7.2 Optimizing Query;81
9.8;5.8 Subqueries;82
9.8.1;5.8.1 IN and NOT IN in Subqueries;83
9.8.2;5.8.2 EXISTS and NOT EXISTS Clauses;83
9.8.3;5.8.3 ANY, SOME, and ALL Operators with Subqueries;84
9.8.4;5.8.4 UPSERT Using Subqueries;84
9.9;5.9 Views;85
9.9.1;5.9.1 Creating Views;85
9.9.2;5.9.2 Dropping Views;86
9.10;5.10 Paged Queries;86
9.10.1;5.10.1 LIMIT and OFFSET;87
9.10.2;5.10.2 Row Value Constructor;87
9.11;5.11 Summary;88
10;Chapter 6: Transactions;89
10.1;6.1 SQL Transactions;89
10.2;6.2 Transaction Properties;89
10.2.1;6.2.1 Atomicity;90
10.2.2;6.2.2 Consistency;90
10.2.3;6.2.3 Isolation;90
10.2.4;6.2.4 Durability;90
10.3;6.3 Transaction Control;90
10.3.1;6.3.1 COMMIT;90
10.3.2;6.3.2 ROLLBACK;90
10.3.3;6.3.3 SAVEPOINT;91
10.3.4;6.3.4 SET TRANSACTION;91
10.4;6.4 Transactions in HBase;91
10.4.1;6.4.1 Integrating HBase with Transaction Manager;91
10.4.2;6.4.2 Components of Transaction Manager;92
10.4.2.1;6.4.2.1 TransactionAware Client;92
10.4.2.2;6.4.2.2 Transaction Manager;92
10.4.2.3;6.4.2.3 Transaction Processor Coprocessor;93
10.4.3;6.4.3 Transaction Lifecycle;94
10.4.4;6.4.4 Concurrency Control;94
10.4.5;6.4.5 Multiversion Concurrency Control;95
10.4.6;6.4.6 Optimistic Concurrency Control;95
10.5;6.5 Apache Tephra As a Transaction Manager;95
10.6;6.6 Phoenix Transactions;96
10.6.1;6.6.1 Enabling Transactions for Tables;99
10.6.2;6.6.2 Committing Transactions;99
10.7;6.7 Transaction Limitations in Phoenix;100
10.8;6.8 Summary;100
11;Chapter 7: Advanced Phoenix Concepts;101
11.1;7.1 Secondary Indexes;101
11.1.1;7.1.1 Global Index;102
11.1.1.1;7.1.1.1 Immutable Tables;104
11.1.1.1.1;7.1.1.1.1 Consistency;105
11.1.1.2;7.1.1.2 Mutable Tables;106
11.1.1.2.1;7.1.1.2.1 Configuration;106
11.1.1.2.2;7.1.1.2.2 Consistency;106
11.1.2;7.1.2 Local Index;106
11.1.3;7.1.3 Covered Index;109
11.1.4;7.1.4 Functional Indexes;110
11.1.5;7.1.5 Index Consistency;110
11.2;7.2 User Defined Functions;112
11.2.1;7.2.1 Writing Custom User Defined Functions;112
11.2.1.1;7.2.1.1 Configuration;115
11.2.1.2;7.2.1.2 Runtime Environment;115
11.3;7.3 Phoenix Query Server;116
11.3.1;7.3.1 Download;117
11.3.2;7.3.2 Installation;117
11.3.3;7.3.3 Setup;117
11.3.4;7.3.4 Starting PQS;117
11.3.5;7.3.5 Client;117
11.3.6;7.3.6 Usage;118
11.3.7;7.3.7 Additional PQS Features;119
11.3.7.1;7.3.7.1 Gotchas;119
11.4;7.4 Summary;119
12;Chapter 8: Integrating Phoenix with Other Frameworks;120
12.1;8.1 Hadoop Ecosystem;120
12.2;8.2 MapReduce Integration;120
12.2.1;8.2.1 Setup;121
12.3;8.3 Apache Spark Integration;124
12.3.1;8.3.1 Setup;125
12.3.2;8.3.2 Reading and Writing Using Dataframe;126
12.4;8.4 Apache Hive Integration;127
12.4.1;8.4.1 Setup;127
12.4.2;8.4.2 Table Creation;128
12.5;8.5 Apache Pig Integration;129
12.5.1;8.5.1 Setup;129
12.5.2;8.5.2 Accessing Data from Phoenix;129
12.5.3;8.5.3 Storing Data to Phoenix;129
12.6;8.6 Apache Flume Integration;130
12.6.1;8.6.1 Setup;130
12.6.2;8.6.2 Configuration;130
12.6.3;8.6.3 Running the Above Setup;131
12.7;8.7 Summary;131
13;Chapter 9: Tools & Tuning;132
13.1;9.1 Phoenix Tracing Server;132
13.1.1;9.1.1 Trace;132
13.1.2;9.1.2 Span;133
13.1.3;9.1.3 Span Receivers;133
13.1.4;9.1.4 Setup;133
13.1.4.1;9.1.4.1 Client Configuration;133
13.1.4.2;9.1.4.2 Server Configuration;134
13.2;9.2 Phoenix Bulk Loading;136
13.2.1;9.2.1 Setup;136
13.2.2;9.2.2 Gotchas;137
13.3;9.3 Index Load Async;138
13.4;9.4 Pherf;138
13.4.1;9.4.1 Setup to Run the Test;142
13.4.2;9.4.2 Gotchas;143
13.5;9.5 Summary;144
14;Index;145




