Big Data Essentials

The Big Four Vs

  • Volume: Extremely large volumes of data
  • Variety: Various forms of data ie structured, semi-structured, unstructured
  • Velocity: Real time, batch, streams of data
  • Veracity or Variability: Inconsistent, sometimes inaccurate, varying data

Why is it important?

  • Value is only given when data is analyzed to provide insights.
  • Helps businesses in smart decision making causing cost reduction and time reduction.
  • Used in Healthcare to fight diseases and improve preventive care.

Big Data Sources

  1. Social media
  2. Machine-generated data
  3. Business transactions
  4. IOT
  5. Sensors


  • Structured: Data has a defined length and format. Examples include words, dates. This data is easy to store (normally SQL) and analyze.
  • Semi-Structured: Does not conform to a specific format but is self-describing (simple key-value pairs). Examples are EDI, SWIFT and XML.
  • Unstructured: Data with no specific format. This includes audio, images, text messaged etc.

Analytics in Big Data

  • Basics: Reporting, dashboards, simple viz, slicing and dicing.
  • Advanced: Complex models using ML, stats, text analytics, neural networks, data mining.
  • Operationalized: Embed big data analytics in a business process to streamline and increase efficiency.
  • Analytics for business decisions: Implemented for better decision-making that drives the revenue.