E-Book, Englisch, 304 Seiten
Chen / Miller Big Data Visualization
1. Auflage 2025
ISBN: 978-1-78528-416-8
Verlag: De Gruyter
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Bring scalability and dynamics to your Big Data visualization
E-Book, Englisch, 304 Seiten
ISBN: 978-1-78528-416-8
Verlag: De Gruyter
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Gain valuable insight into big data analytics with this book. Covering the tools you need to analyse data, together with IBM certified expert James Miller?s insight, this book is the key to data visualization success. ? Learn the tools & techniques to process big data for efficient data visualization ? Packed with insightful real-world use cases ? Addresses the difficulties faced by professionals in the field of big data analytics
Autoren/Hrsg.
Weitere Infos & Material
Challenges of big data visualization
We're assuming that you have some background with the topic of data visualization and therefore the earlier deliberations were just enough to refresh your memory and sharpen your appetite for the real purpose of this book.
Big data
Let's take a pause here to define big data.
A large assemblage of data and datasets that are so large or complex that traditional data processing applications are inadequate and data about every aspect of our lives has all been used to define or refer to big data.
In 2001, then Gartner analyst Doug Laney introduced the 3Vs concept ( refer to the following link http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf). The 3Vs, according to Doug Laney, are volume, variety, and velocity. The 3Vs make up the dimensionality of big data: volume (or the measurable amount of data), variety (meaning the number of types of data), and velocity (referring to the speed of processing or dealing with that data).
With this concept in mind, all aspects of big data become increasingly challenging and as these dimensions increase or expand they will also encumber the ability to effectively visualize the data.
Using Excel to gauge your data
Look at the following figure and remember that Excel is not a tool to determine whether your data qualifies as big data:
If your data is too big for Microsoft Excel, it still really doesn't necessarily qualify as big data. In fact, gigabytes of data still are manageable with various techniques, enterprise, and even open source tools, especially with the lower cost of storage today. It is important to be able to realistically size the data that you will be using in an analytic or visualization project before selecting an approach or technology (keeping in mind expected data growth rates).
Pushing big data higher
As the following figure illustrates, the aforementioned Volume, Variety, and Velocity have and will continue to lift Big Data into the future:
The 3Vs
Let's take a moment to further examine the Vs.
Volume
Volume involves determining or calculating how much of something there is, or in the case of big data, how much of something there will be. Here is a thought provoking example:
How fast does moon dust pile up?
As written by Megan Gannon in 2014, (http://www.space.com/23694-moon-dust-mystery-apollo-data.html), a revisited trove of data from NASA's Apollo missions more than 40 years ago is helping scientists answer a lingering lunar question: how fast does moon dust build up? The answer: it would take 1,000 years for a layer of moon dust about a millimeter (0.04 inches) thick to accumulate (big data accumulates much quicker than moon dust!).
With every click of a mouse, big data grows to be petabytes (1,024 terabytes) or even Exabyte's (1,024 petabytes) consisting of billions to trillions of records generated from millions of people and machines.
Although it's been reported (for example, you can refer to the following link: http://blog.sqlauthority.com/2013/07/21/sql-server-what-is-the-maximum-relational-database-size-supported-by-single-instance/) that structured or relational database technology could support applications capable of scaling up to 1 petabyte of storage, it doesn't take a lot of thought to understand with that kind of volume it won't be easy to handle capably, and the accumulation rate of big data isn't slowing any time soon.
It's the case of big, bigger (and we haven't even approached determining), and biggest yet!
Velocity
Velocity is the rate or pace at which something is occurring. The measured velocity experience can and usually does change over time. Velocities directly affect outcomes.
Previously, we lived and worked in a batch environment, meaning we formulate a question (perhaps what is our most popular product?), submit the question (to the information technology group), and wait--perhaps after the nightly sales are processed (maybe 24 hours later), and finally, we receive an answer. This is a business model that doesn't hold up now with the many new sources of data (such as social media or mobile applications), which record and capture data in real time, all of the time. The answers to the questions asked may actually change within a 24-hour period (such is the case with trending now information that you may have observed when you are online).
Given the industry hot topics such as Internet of Things (IoT), it is safe to say that these pace expectations will only quicken.
Variety
Thinking back to our previous mention of relational databases, it is generally accepted that relational databases are considered to be highly structured, although they may contain text in , , or fields.
Data today (and especially when we talk about big data) comes from many kinds of data sources, and the level in which that data is structured varies greatly from data source to data source. In fact, the growing trend is for data to continue to lose structure and to continue to add hundreds (or more?) of new formats and structures (formats that go beyond pure text, photo, audio, video, web, GPS data, sensor data, relational databases, documents, SMS, pdf, flash, and so on) all of the time.
Categorization
The process of categorization helps us to gain an understanding of the data source.
The industry commonly categorizes big data this way--into the two groups (structured and unstructured)--but the categorizing doesn't stop there.
Some simple research reveals some interesting new terms for subcategorizing these two types of data varieties:
Structured data includes subcategories such as created, provoked, transactional, compiled, and experimental, while unstructured data includes subcategories such as captured and submitted (just to name a few of the currently trending terms for categorizing the types of big data. You may be familiar with or be able to find more).
It's worth taking some time here to speak about these various data formats (varieties) to help drive the point to the reader of the challenges of dealing with the numerous big data varieties:
- Created data: This is the data being created for a purpose; such as focus group surveys or asking website users to establish an account on the site (rather than allowing anonymous access).
- Provoked data: This is described as data received after some form of provoking, perhaps such as providing someone with the opportunity to express the individual's personal view on a topic, such as customers filling out product review forms.
- Transactional data: This is data that is described as database transactions, for example, the record of a sales transaction.
- Compiled data: This is data described as information collected (or compiled) on a particular topic such as credit scores.
- Experimental data: This is described as when someone experiments with data and/or sources of data to explore potential new insights. For example, combining or relating sales transactions to marketing and promotional information to determine a (potential) correlation.
- Captured data: This is the data created passively due to a person's behavior (like when you enter a search term on Google, perhaps the creepiest data of all!).
- User-generated data: This is the data generated every second by individuals, such as from Twitter, Facebook, YouTube, and so on (compared to captured data, this is data you willingly create or put out there).
To sum up, big data comes with no common or expected format and the time required to impose a structure on the data has proven to be no longer worth it.
Such are the 3Vs
In addition to what we mentioned earlier, there are additional challenging areas that big data brings to the table especially to the task of data visualization, for example, the ability to effectively deal with data quality, outliers, and to display results in a meaningful way, to name a few.
Again, it's worth quickly visiting each of these topics here now.
Data quality
The value of almost anything and everything is directly proportional to its level of quality and higher quality is equal to higher value.
Data is no different. Data (any data) can only prove to be a valuable instrument if its quality is certain.
The general areas of data quality include:
- Accuracy
- Completeness
- Update status
- Relevance
- Consistency (across sources)
- Reliability
- Appropriateness
- Accessibility
The quality of data can be affected by the way it is entered, stored, and managed and...




