Sherman | Business Intelligence Guidebook | E-Book | www.sack.de
E-Book

E-Book, Englisch, 550 Seiten

Sherman Business Intelligence Guidebook

From Data Integration to Analytics
1. Auflage 2014
ISBN: 978-0-12-411528-6
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark

From Data Integration to Analytics

E-Book, Englisch, 550 Seiten

ISBN: 978-0-12-411528-6
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark



Between the high-level concepts of business intelligence and the nitty-gritty instructions for using vendors' tools lies the essential, yet poorly-understood layer of architecture, design and process. Without this knowledge, Big Data is belittled - projects flounder, are late and go over budget. Business Intelligence Guidebook: From Data Integration to Analytics shines a bright light on an often neglected topic, arming you with the knowledge you need to design rock-solid business intelligence and data integration processes. Practicing consultant and adjunct BI professor Rick Sherman takes the guesswork out of creating systems that are cost-effective, reusable and essential for transforming raw data into valuable information for business decision-makers. After reading this book, you will be able to design the overall architecture for functioning business intelligence systems with the supporting data warehousing and data-integration applications. You will have the information you need to get a project launched, developed, managed and delivered on time and on budget - turning the deluge of data into actionable information that fuels business knowledge. Finally, you'll give your career a boost by demonstrating an essential knowledge that puts corporate BI projects on a fast-track to success. - Provides practical guidelines for building successful BI, DW and data integration solutions. - Explains underlying BI, DW and data integration design, architecture and processes in clear, accessible language. - Includes the complete project development lifecycle that can be applied at large enterprises as well as at small to medium-sized businesses - Describes best practices and pragmatic approaches so readers can put them into action. - Companion website includes templates and examples, further discussion of key topics, instructor materials, and references to trusted industry sources.

Rick Sherman is the founder of Athena IT Solutions, which provides consulting, training and vendor services for business intelligence, analytics, data integration and data warehousing. He is an adjunct faculty member at Northeastern University's Graduate School of Engineering and is a frequent contributor to industry publications, events, and webinars.
Sherman Business Intelligence Guidebook jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;Front
Cover;1
2;Business Intelligence
Guidebook;4
3;Copyright;5
4;Contents;6
5;Foreword;18
6;How to Use This Book;20
6.1;CHAPTER SUMMARIES;20
7;Acknowledgments;24
8;PART I -
CONCEPTS AND
CONTEXT;26
8.1;CHAPTER 1
- THE BUSINESS DEMAND FOR DATA, INFORMATION, AND ANALYTICS;28
8.1.1;JUST ONE WORD: DATA;28
8.1.2;WELCOME TO THE DATA DELUGE;29
8.1.3;TAMING THE ANALYTICS DELUGE;31
8.1.4;TOO MUCH DATA, TOO LITTLE INFORMATION;33
8.1.5;DATA CAPTURE VERSUS INFORMATION ANALYSIS;35
8.1.6;THE FIVE CS OF DATA;37
8.1.7;COMMON TERMINOLOGY FROM OUR PERSPECTIVE;39
8.1.8;REFERENCES;44
9;PART II -
BUSINESS AND
TECHNICAL NEEDS;46
9.1;CHAPTER 2 - JUSTIFYING BI: BUILDING THE BUSINESS AND TECHNICAL CASE;48
9.1.1;WHY JUSTIFICATION IS NEEDED;48
9.1.2;BUILDING THE BUSINESS CASE;49
9.1.3;BUILDING THE TECHNICAL CASE;53
9.1.4;ASSESSING READINESS;57
9.1.5;CREATING A BI ROAD MAP;60
9.1.6;DEVELOPING SCOPE, PRELIMINARY PLAN, AND BUDGET;60
9.1.7;OBTAINING APPROVAL;65
9.1.8;COMMON JUSTIFICATION PITFALLS;65
9.2;CHAPTER 3 - DEFINING REQUIREMENTS—BUSINESS, DATA AND QUALITY;68
9.2.1;THE PURPOSE OF DEFINING REQUIREMENTS;68
9.2.2;GOALS;69
9.2.3;DELIVERABLES;70
9.2.4;ROLES;72
9.2.5;DEFINING REQUIREMENTS WORKFLOW;74
9.2.6;INTERVIEWING;81
9.2.7;DOCUMENTING REQUIREMENTS;85
10;PART III -
ARCHITECTURALFRAMEWORK;88
10.1;CHAPTER 4 - ARCHITECTURE FRAMEWORK;90
10.1.1;THE NEED FOR ARCHITECTURAL BLUEPRINTS;90
10.1.2;ARCHITECTURAL FRAMEWORK;91
10.1.3;INFORMATION ARCHITECTURE;92
10.1.4;DATA ARCHITECTURE;93
10.1.5;TECHNICAL ARCHITECTURE;97
10.1.6;PRODUCT ARCHITECTURE;103
10.1.7;METADATA;103
10.1.8;SECURITY AND PRIVACY;105
10.1.9;AVOIDING ACCIDENTS WITH ARCHITECTURAL PLANNING;106
10.1.10;DO NOT OBSESS OVER THE ARCHITECTURE;108
10.2;CHAPTER 5 - INFORMATION ARCHITECTURE;110
10.2.1;THE PURPOSE OF AN INFORMATION ARCHITECTURE;110
10.2.2;DATA INTEGRATION FRAMEWORK;111
10.2.3;DIF INFORMATION ARCHITECTURE;112
10.2.4;OPERATIONAL BI VERSUS ANALYTICAL BI;125
10.2.5;MASTER DATA MANAGEMENT;128
10.3;CHAPTER 6 - DATA ARCHITECTURE;132
10.3.1;THE PURPOSE OF A DATA ARCHITECTURE;132
10.3.2;HISTORY;133
10.3.3;DATA ARCHITECTURAL CHOICES;143
10.3.4;DATA INTEGRATION WORKFLOW;153
10.3.5;DATA WORKFLOW—RISE OF EDW AGAIN;161
10.3.6;OPERATIONAL DATA STORE;162
10.3.7;REFERENCES;167
10.4;CHAPTER 7 - TECHNOLOGY & PRODUCT ARCHITECTURES;168
10.4.1;WHERE ARE THE PRODUCT AND VENDOR NAMES?;168
10.4.2;EVOLUTION NOT REVOLUTION;169
10.4.3;TECHNOLOGY ARCHITECTURE;172
10.4.4;PRODUCT AND TECHNOLOGY EVALUATIONS;190
11;PART IV -
DATA DESIGN;196
11.1;CHAPTER 8 - FOUNDATIONAL DATA MODELING;198
11.1.1;THE PURPOSE OF DATA MODELING;198
11.1.2;DEFINITIONS—THE DIFFERENCE BETWEEN A DATA MODEL AND DATA MODELING;198
11.1.3;THREE LEVELS OF DATA MODELS;199
11.1.4;DATA MODELING WORKFLOW;202
11.1.5;WHERE DATA MODELS ARE USED;203
11.1.6;ENTITY-RELATIONSHIP (ER) MODELING OVERVIEW;204
11.1.7;NORMALIZATION;214
11.1.8;LIMITS AND PURPOSE OF NORMALIZATION;219
11.2;CHAPTER 9 - DIMENSIONAL MODELING;222
11.2.1;INTRODUCTION TO DIMENSIONAL MODELING;222
11.2.2;HIGH-LEVEL VIEW OF A DIMENSIONAL MODEL;223
11.2.3;FACTS;223
11.2.4;DIMENSIONS;228
11.2.5;SCHEMAS;233
11.2.6;ENTITY RELATIONSHIP VERSUS DIMENSIONAL MODELING;238
11.2.7;PURPOSE OF DIMENSIONAL MODELING;241
11.2.8;FACT TABLES;243
11.2.9;ACHIEVING CONSISTENCY;245
11.2.10;ADVANCED DIMENSIONS AND FACTS;246
11.2.11;DIMENSIONAL MODELING RECAP;259
11.3;CHAPTER 10 - BUSINESS INTELLIGENCE DIMENSIONAL MODELING;262
11.3.1;INTRODUCTION;262
11.3.2;HIERARCHIES;262
11.3.3;OUTRIGGER TABLES;269
11.3.4;SLOWLY CHANGING DIMENSIONS;270
11.3.5;CAUSAL DIMENSION;287
11.3.6;MULTIVALUED DIMENSIONS;288
11.3.7;JUNK DIMENSIONS;290
11.3.8;VALUE BAND REPORTING;293
11.3.9;HETEROGENEOUS PRODUCTS;294
11.3.10;ALTERNATE DIMENSIONS;295
11.3.11;TOO FEW OR TOO MANY DIMENSIONS;297
12;PART V -
DATA INTEGRATIONDESIGN;298
12.1;CHAPTER 11 - DATA INTEGRATION DESIGN AND DEVELOPMENT;300
12.1.1;GETTING STARTED WITH DATA INTEGRATION;300
12.1.2;DATA INTEGRATION ARCHITECTURE;302
12.1.3;DATA INTEGRATION REQUIREMENTS;305
12.1.4;DATA INTEGRATION DESIGN;310
12.1.5;DATA INTEGRATION STANDARDS;315
12.1.6;LOADING HISTORICAL DATA;320
12.1.7;DATA INTEGRATION PROTOTYPING;323
12.1.8;DATA INTEGRATION TESTING;323
12.2;CHAPTER 12 - DATA INTEGRATION PROCESSES;326
12.2.1;INTRODUCTION: MANUAL CODING VERSUS TOOL-BASED DATA INTEGRATION;326
12.2.2;DATA INTEGRATION SERVICES;334
13;PART VI -
BUSINESSINTELLIGENCEDESIGN;360
13.1;CHAPTER 13 - BUSINESS INTELLIGENCE APPLICATIONS;362
13.1.1;BI CONTENT SPECIFICATIONS;362
13.1.2;REVISE BI APPLICATIONS LIST;364
13.1.3;BI PERSONAS;365
13.1.4;BI DESIGN LAYOUT—BEST PRACTICES;368
13.1.5;DATA DESIGN FOR SELF-SERVICE BI;373
13.1.6;MATCHING TYPES OF ANALYSIS TO VISUALIZATIONS;376
13.2;CHAPTER 14 - BI DESIGN AND DEVELOPMENT;384
13.2.1;BI DESIGN;384
13.2.2;BI DEVELOPMENT;392
13.2.3;BI APPLICATION TESTING;397
13.3;CHAPTER 15 - ADVANCED ANALYTICS;400
13.3.1;ADVANCED ANALYTICS OVERVIEW AND BACKGROUND;400
13.3.2;PREDICTIVE ANALYTICS AND DATA MINING;402
13.3.3;ANALYTICAL SANDBOXES AND HUBS;408
13.3.4;BIG DATA ANALYTICS;420
13.3.5;DATA VISUALIZATION;426
13.3.6;REFERENCE;427
13.4;CHAPTER 16 - DATA SHADOW SYSTEMS;428
13.4.1;THE DATA SHADOW PROBLEM;428
13.4.2;ARE THERE DATA SHADOW SYSTEMS IN YOUR ORGANIZATION?;430
13.4.3;WHAT KIND OF DATA SHADOW SYSTEMS DO YOU HAVE?;431
13.4.4;DATA SHADOW SYSTEM TRIAGE;432
13.4.5;THE EVOLUTION OF DATA SHADOW SYSTEMS IN AN ORGANIZATION;433
13.4.6;DAMAGES CAUSED BY DATA SHADOW SYSTEMS;437
13.4.7;THE BENEFITS OF DATA SHADOW SYSTEMS;438
13.4.8;MOVING BEYOND DATA SHADOW SYSTEMS;439
13.4.9;MISGUIDED ATTEMPTS TO REPLACE DATA SHADOW SYSTEMS;442
13.4.10;RENOVATING DATA SHADOW SYSTEMS;443
14;PART VII -
ORGANIZATION;448
14.1;CHAPTER 17 - PEOPLE, PROCESS AND POLITICS;450
14.1.1;THE TECHNOLOGY TRAP;450
14.1.2;THE BUSINESS AND IT RELATIONSHIP;452
14.1.3;ROLES AND RESPONSIBILITIES;454
14.1.4;BUILDING THE BI TEAM;456
14.1.5;TRAINING;466
14.1.6;DATA GOVERNANCE;469
14.2;CHAPTER 18 - PROJECT MANAGEMENT;474
14.2.1;THE ROLE OF PROJECT MANAGEMENT;474
14.2.2;ESTABLISHING A BI PROGRAM;475
14.2.3;BI ASSESSMENT;485
14.2.4;WORK BREAKDOWN STRUCTURE;490
14.2.5;BI ARCHITECTURAL PLAN;495
14.2.6;BI PROJECTS ARE DIFFERENT;497
14.2.7;PROJECT METHODOLOGIES;498
14.2.8;BI PROJECT PHASES;504
14.2.9;BI PROJECT SCHEDULE;509
14.3;CHAPTER 19 - CENTERS OF EXCELLENCE;518
14.3.1;THE PURPOSE OF CENTERS OF EXCELLENCE;518
14.3.2;BI COE;519
14.3.3;DATA INTEGRATION CENTER OF EXCELLENCE;526
14.3.4;ENABLING A DATA-DRIVEN ENTERPRISE;536
14.3.5;REFERENCE;537
15;Index;538


Chapter 1

The Business Demand for Data, Information, and Analytics


Abstract


In the business world, knowledge is not just power. It is the lifeblood of a thriving enterprise. Knowledge comes from information, and that, in turn, comes from data. Many enterprises are overwhelmed by the deluge of data, which they are receiving from all directions. They are wondering if they can handle Big Data—with its expanding volume, variety, and velocity. There is a big difference between raw data, which by itself is not useful, and actionable information, which business people can use with confidence to make decisions. Data must to be transformed to make it clean, consistent, conformed, current, and comprehensive—the five Cs of data. It is up to a Business Intelligence (BI) team to gather and manage the data to empower the company’s business groups with the information they need to gain knowledge—knowledge that helps them make informed decisions about every step the company takes. While there are attempts to circumvent or replace BI with operational systems, there really is no good substitute for true BI. Operational systems may excel at data capture, but BI excels at information analysis.

Keywords


Big Data; Data; Data 5 Cs; Data capture; Data variety; Data velocity; Data volume; Information; Information analysis; Operational BI
Information in This Chapter
• The data and information deluge
• The analytics deluge
• Data versus actionable information
• Data capture versus information analysis
• The five Cs of data
• Common terminology

Just One Word: Data


“I just want to say one word to you. Just one word… Are you listening? … Plastics. There’s a great future in plastics.”

Mr. McGuire in the 1967 movie The Graduate.

The Mr. McGuires of the world are no longer advising newly-minted graduates to get into plastics. But perhaps they should be recommending data. In today’s digital world data is the key, the ticket, and the Holy Grail all rolled into one.
I do not just mean it’s growing in importance as a profession, although it is a great field to get into, and I’m thrilled that my sons Jake and Josh are pursuing careers in data and technology. Data is where the dollars are when it comes to company budgets. Every few years there is another report showing that business intelligence (BI) is at or near the top of the chief information officer’s (CIO) list of priorities.
Enterprises today are driven by data, or, to be more precise, information that is gleaned from data. It sheds light on what is unknown, it reduces uncertainty, and it turns decision-making from an art to a science.
But whether it’s Big Data or just plain old data, it requires a lot of work before it is actually something useful. You would not want to eat a cup of flour, but baked into a cake with butter, eggs, and sugar for the right amount of time at the right temperature it is transformed into something delicious. Likewise, raw data is unpalatable to the business person who needs it to make decisions. It is inconsistent, incomplete, outdated, unformatted, and riddled with errors. Raw data needs integration, design, modeling, architecting, and other work before it can be transformed into consumable information.
This is where you need data integration to unify and massage the data, data warehousing to store and stage it, and BI to present it to decision-makers in an understandable way. It can be a long and complicated process, but there is a path; there are guidelines and best practices. As with many things that are hard to do, there are promised shortcuts and “silver bullets” that you need to learn to recognize before they trip you up.
It will take a lot more than just reading this book to make your project a success, but my hope is that it will help set you on the right path.

Welcome to the Data Deluge


In the business world, knowledge is not just power. It is the lifeblood of a thriving enterprise. Knowledge comes from information, and that, in turn, comes from data. It is up to a BI team to gather and manage the data to empower the company’s business groups with the information they need to gain knowledge—knowledge that helps them make informed decisions about every step the company takes.
Enterprises need this information to understand their operations, customers, competitors, suppliers, partners, employees, and stockholders. They need to learn about what is happening in the business, analyze their operations, react to internal and external pressures, and make decisions that will help them manage costs, grow revenues, and increase sales and profits. Forrester Research sums it up perfectly: “Data is the raw material of everything firms do, but too many have been treating it like waste material—something to deal with, something to report on, something that grows like bacteria in a petri dish. No more! Some say that data is the new oil—but we think that comparing data to oil is too limiting. Data is the new sun: it’s limitless and touches everything firms do. Data must flow fast and rich for your organization to serve customers better than your competitors can. Firms must invest heavily in building a next-generation customer data management capability to grow revenue and profits in the age of the customer. Data is an asset that even CFOs will realize should have a line on the balance sheet right alongside property, plant, and equipment” [1].
It can be a problem, however, when there is more data than an enterprise can handle. They collect massive amounts of data every day internally and externally as they interact with customers, partners, and suppliers. They research and track information on their competitors and the marketplace. They put tracking codes on their websites so they can learn exactly how many visitors they get and where they came from. They store and track information required by government regulations and industry initiatives. Now there is the Internet of Things (IoT), with sensors embedded in physical objects such as pacemakers, thermostats, and dog collars where they collect data. It is a deluge of data (Figure 1.1).

Data Volume, Variety, and Velocity


It is not only that enterprises accumulate data in ever-increasing volumes, the variety and velocity of data is also increasing. Although the emerging “Big Data” databases can cause an enterprise’s ability to gather data to explode, the volume, velocity, and variety are all expanding no matter how “big” or “small” the data is.
Volume—According to many experts, 90% of the data in the world today was created in the last two years alone. When you hear that statistic you might think that it is coming from all the chatter on social media, but data is being generated by all manner of activities. For just one example, think about the emergence of radio frequency identification (RFID) to track products from manufacturing to purchase. It is a huge category of data that simply did not exist before. Although not all of the data gathered is significant for an enterprise, it still leaves a massive amount of data with which to deal.
Velocity—Much of the data now is time sensitive, and there is greater pressure to decrease the time between when it is captured and when it is used for reporting. We now depend on the speed of some of this data. It is extremely helpful to receive an immediate notification from your bank, for example, when a fraudulent transaction is detected, enabling you to cancel your credit card immediately. Businesses across industry sectors are using current data when interacting with their customers, prospects, suppliers, partners, employees, and other stakeholders.
Variety—The sources of data continue to expand. Receiving data from disparate sources further complicates things. Unstructured data, such as audio, video, and social media, and semistructured data like XML and RSS feeds must be handled differently from traditional structured data. The CIO of the past thought phones were just for talking, not something that collected data. He also thought Twitter was something that birds did. Now that an enterprise can collect data from tweets about its products, how does it handle that data and then what does it do with it? Also, what does it do with the invaluable data that business people create in spreadsheets and Microsoft Word documents and use in decision-making? Formerly, CIOs just had to worry about collecting and analyzing data from back office applications, but now their data can come from people, machines, processes, and applications spread across the world.

FIGURE 1.1 Too much information. www.CartoonStock.com.
Unfortunately, enterprises have not been as good at organizing and understanding the data as they have been at gathering it. Data has no value unless you can understand what you have, analyze it, and then act on the insights from the analysis.
See the book’s companion Website www.BIguidebook.com for links to industry research, templates, and other materials to help you learn more about business intelligence and make your next project a success.
To receive updates on newly posted material, subscribe to the email list on the Website or follow the RSS feed of my blog at www.datadoghouse.com.

Taming the Analytics Deluge


With this flood of data...



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.