December 1, 2022

Go Big or Go Home: Getting Started with Big Data 

In the 1800s, people were dying of cholera in London. At first, they thought that the disease was airborne, transmitted through bad smells from rotting matter. Until a physician named John Snow took a map, plotted deaths due to cholera, and analyzed the data points against the water source of households. He then demonstrated the link between cholera and contaminated drinking water. Later, he was able to pinpoint a specific pump that needed to be disabled in order to control the spread of the disease.

Image 1. John Snow’s digitized data, overlaid on the original map.

The case was hardly the first example of using data to uncover patterns and trends. Two centuries earlier in England, John Graunt published a study on death and mortality rates in age groups. And centuries before that, the Romans were already studying life tables.

Today, the number of data points that define one person is endless: age, sex, height, weight, blood sugar level, blood pressure, temperature, oxygen saturation — and these are just basic health information. Through government records, we can find out places of work and residence, education, and occupation. Beyond tabular records, we can learn about a person’s mood, whether they traveled extensively, and even their dinner last night as posted on Instagram Stories. Imagine how humankind will look back at this century and derive various conclusions when it comes to our health, preferences, and life choices. Imagine how big data can be.

 

What is Big Data?

John Snow’s cholera study is an example of structured data. Each subject has a place of residence and a cause of death. Each place of residence can be traced to its nearest water source. In the same way, each one of us has a record of purchases in a shopping app, as well as a customer profile that categorizes us into a certain demographic. Based on these structured data, businesses can slice, dice, and segment us into groups. They can use the data to serve ads or promos for moms. Or send you an e-card with a voucher on your birthday.

However, not all data is structured. Now we have emails, website clicks, call center transcripts, social media posts, Tiktok videos, CCTV recordings, and more. These are examples of unstructured or semi-structured data that we already have. We may or may not know how to analyze them today, but storing them now could be a big help to businesses or the government in the future.

On another note, where and how do we store these large amounts of data? Do we have the processing power to sift through terabytes of information just to uncover a trend?

This is where Big Data comes in.

Big data refers to a set of data that is so massive and complex, it cannot be captured, managed, or processed using traditional data processing software. Big data involves collecting and analyzing structured, unstructured, and semi-structured data to be used in machine learning, predictive modeling, and other advanced applications.

 

What makes Big Data, big?

If I had petabytes of data, and you only had terabytes, does it make my data bigger than yours? Not necessarily. As coined by Gartner Analyst Doug Lany in 2001, 3 Vs characterize big data: volume, variety, and velocity.

  • Volume. Big Data typically involves terabytes or more.
  • Variety. An organization with Big Data would have various data types to be captured, stored, and processed in one system.
  • Velocity. Handling Big Data means doing so at a very fast pace.

Some references add 3 more Vs to the picture: Veracity, referring to the accuracy and trustworthiness of data; Value, referring to the business value data provides; and Variability, referring to multiple meanings or formats that the same data can have in different source systems. 

 

What are the types of Big Data?

While Big Data covers transactional data — the kind you can generate through digital worksheets — it goes beyond that. There are 3 types of Big Data.

  • Structured. When you fill out a form, purchase a product, or open a bank account, businesses store your name, address, age, address, and more. These data can be easily stored in a relational database and analyzed. For example, you can generate a report of the top 10 customers in your online shop based on the total amount of purchases each one has made in the last 6 months.
  • Unstructured. It is difficult to define the form or even the value of unstructured data. It can be a series of images of a person from birth through life. Or a video footage of a person showing different facial expressions while trying out different brands of chocolate. Or a log of links a person has clicked on after a Google search. An organization may or may not know yet what to do with the raw file, but can store it in their Big Data platform for later use.
  • Semi-structured. This data is structured in form but cannot be defined in a relational database. For example: data in an XML file.

 

Why do you need Big Data?

If you already have millions of rows in your spreadsheets containing your customers’ information, do you still need  Big Data? It all depends on the opportunities you are willing to lose.

There are billions of smartphones in the world today. Facebook has 2.96 billion monthly active users. Google processes 3.5 billion searches in a single day. Every scroll, click, or pixel can translate into insights that you can never derive from a two-dimensional table. 

Here are quick examples of Big Data use cases that make huge impact:

  • Government. Use social media data on terrorist group history, philosophy, and ideology to derive insights on and diminish future attacks.
  • Criminal activities. Leverage cryptocurrency to fund and enable criminal operations without leaving a trace to its source.
  • Medicine. Use data of cancer patients from all over the world to determine how cancer mutations and proteins correlate with successful and affordable treatments.
  • Smart cars. Capture data from infrared sensors, radar, GPS, and 3D cameras to make lightning-speed decisions on navigation.
  • Retail. Get to know customers completely and deeply to provide them with hyper-customized offerings based on their activities on your website, app, social media channels, and physical store.
  • Cybersecurity. Recognize suspicious transactions immediately to prevent fraud, attacks, or data breaches.

 

Where do we go from here?

Child in Halloween costume as

Big Data poses impactful benefits to your organization today and in the future. It is also a major initiative that may be overwhelming. So, where do we go from here?

We have put together foundational information on how you can get started with Big Data:

Check out how Stratpoint worked with brands to get started with Big Data:

Need help with your Big Data initiative? Reach out to Stratpoint through the form below.