+ - 0:00:00
Notes for current slide
Notes for next slide

181U Spring 2020

Persistent Data

1
  • 181U
  • See acknowledgements

Agenda

Your browser does not support the audio element.
  • Types of Data
  • Data Characteristics
  • Relational Data & SQL
  • NoSQL
  • RealTime databases
  • Digital Shadows (Caching)
  • Data Visualization for IoT
2
  • 181U
  • See acknowledgements

Types of data

Your browser does not support the audio element.
  • Configuration
  • Status : e.g. device health
  • Sensors
    • periodic (scheduled)
    • episodic (e.g. on change)
3
  • 181U
  • See acknowledgements

Data Rates

Your browser does not support the audio element.
  • Low
    • configuration
    • status
    • alarms
    • episodic for slow-moving measurements (e.g. temperature)
  • Medium
    • environment measurements
  • High
    • audio
    • images
    • video
4
  • 181U
  • See acknowledgements

Relational Data

Your browser does not support the audio element.

  • The relational model for organizing data defined by Edgar Codd in 1969
  • The basis for most large database systems (Oracle, MySql, Postgress, sqlite)
  • The model provides a structure and language consistent with first-order predicate logic

    • First-order logic uses quantified variables over non-logical objects and allows the use oo sentences that contain variables
    • example: “there exists x such that x is Socrates and x is a man”
  • All data is represented in terms of tuples, grouped into relations

  • The purpose of the relational model is to provide a declarative method for specifying data and queries
    • users directly state what information the database contains and what information they want from it. (https://en.wikipedia.org/wiki/Relational_model)
5
  • 181U
  • See acknowledgements

Relations in a Database

Your browser does not support the audio element.

  • the STORE table contains three columns:
    • Store_key
    • City
    • Region
  • the PRODUCT table contains three columns:
    • Product key
    • Description
    • Brand
  • the SALES_FACT table contains five columns:
    • Store_key
    • Product_key
    • Sales
    • Cost
    • Profit
6
  • 181U
  • See acknowledgements

Relations in a Database

Your browser does not support the audio element.

  • Relational databases are organized as sets for relations (aka tables).
  • The data in these tables are accessed through queries
  • This example database has three tables – stores, products, and sales.
  • Example queries
    • How many copies of toy story were sold in the EAST region ?
    • What product had the highest profit in New York ?
  • Notice that each table has an “key” attribute and within a table, each row (tuple) has a unique value for that key (note, the sales fact table should have a separate key)
  • Notice that the tuples in other tables may reference that key.

https://docs.oracle.com/cd/B14099_19/bi.1012/b13915/i_rel_chapter.htm

7
  • 181U
  • See acknowledgements

Example Design

Your browser does not support the audio element.

  • Customer: (Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email, Sex)
  • Order: (Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
  • Order Line: (Order No, Order Line No, Product Code, Qty)
  • Invoice: (Invoice No, Customer ID, Order No, Date, Status)
  • Invoice Line: (Invoice No, Invoice Line No, Product Code, Qty Shipped)
  • Product: (Product Code, Product Description)

  • The “primary keys” are in bold and “foreign keys” in italic

  • An example query might ask for all invoices to a particular customer containing a particular product
  • Separating data in this way minimizes redundant information, enables maintenance of the database (for example, changing the address of a customer is a localized change), and makes execution of queries more efficient

https://en.wikipedia.org/wiki/Relational_model

8
  • 181U
  • See acknowledgements

Flat vs. Relational Tables

Your browser does not support the audio element.
City,Edition,Sport,Discipline,Athlete,NOC,Gender,Event,Event_gender,Medal
Athens,1896,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100m freestyle,M,Gold
Athens,1896,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100m freestyle,M,Silver
Athens,1896,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100m freestyle for sailors,M,Bronze
Athens,1896,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100m freestyle for sailors,M,Gold

Athletes

id,name,gender,nationality
1,"ÖRSTED, Hans-Henrik",M,DEN
2,"ÖRTEGREN, Ruben",M,SWE
3,"ÖRVIG, Erik",M,NOR

Countries

id,country
AFG,Afghanistan
ALB,Albania
ALG,Algeria

Events

id,name,sport,discipline
1,- 47.63kg (flyweight),Boxing,Boxing
2,- 47.6kg (light-flyweight),Wrestling,Wrestling Free.
3,- 48 kg,Judo,Judo

Editions

id,year,city
1,1896,Athens
2,1900,Paris
3,1904,St Louis
9
  • 181U
  • See acknowledgements

SQL (Structured Query Language)

Your browser does not support the audio element.
  • One of the top five programming languages
  • Based upon E.F. Codd’s relational model
  • Provides a way to manage data that is hardware independent
  • Two language representations
    • Relational algebra , is a family of algebras with a well-founded semantics used for modelling the data stored in relational databases, and defining queries on it.
    • Relational calculus provides declarative language for specifying queries (e.g. SQL)
    • Codd proved that they are equivalent
10
  • 181U
  • See acknowledgements

SQL (Structured Query Language)

Your browser does not support the audio element.

The relational algebra might suggest these steps to retrieve the phone numbers and names of book stores that supply Some Sample Book:

  1. Join book stores and titles over the BookstoreID.
  2. Restrict the result of that join to tuples for the book Some Sample Book.
  3. Project the result of that restriction over StoreName and StorePhone.

The relational calculus would formulate a descriptive, declarative way:

  1. Get StoreName and StorePhone for book stores such that there exists a title BK with the same BookstoreID value and with a BookTitle value of Some Sample Book.
11
  • 181U
  • See acknowledgements

SQL

Your browser does not support the audio element.

Three parts to a Query (only first two are required)

  1. SELECT
  2. FROM
  3. [WHERE]
  • SELECT specifies what you want as output
  • FROM specifies the input
  • WHERE specifies conditions that must be true on output (filter)
12
  • 181U
  • See acknowledgements

SQL Join on Orders, Customers

Your browser does not support the audio element.
OrderID CustomerID OrderDate
10308 2 1996-09-18
10309 37 1996-09-19
10310 77 1996-09-20
CustomerID CustomerName ContactName Country
1 Alfreds Futterkiste Maria Anders Germany
2 Ana Trujillo Emparedados y helados Ana Trujillo Mexico
3 Antonio Moreno Taquería Antonio Moreno Mexico
SELECT Orders.OrderID, Customers. CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
OrderID CustomerName OrderDate
10308 Ana Trujillo Emparedados y helados 9/18/1996

(https://www.w3schools.com/sql/sql_ref_join.asp)

13
  • 181U
  • See acknowledgements

SQL Database Properties (ACID)

Your browser does not support the audio element.
  • Atomicity – transactions are atomic – all or none
  • Consistency – database remains in a consistent state after transation
  • Isolation – transactions are executed as if they are serial
  • Durability – database preserves latest updates even if the system fails or restarts

These properties are expensive to satisfy – especially in a distributed system with high update rate.

14
  • 181U
  • See acknowledgements

NoSQL

Your browser does not support the audio element.
  • Sacrifices consistency for availability, partition tolerance, speed
  • Uses key/value data organization rather than relational tables
  • Scale to clusters of machines

  • Examples

    • Google’s Bigtable
    • Amazon’s DynamoDB
15
  • 181U
  • See acknowledgements

Time Series Data

Your browser does not support the audio element.
  • Scenarios
    1. Smart electric meter records electricity consumption data per our and generates billing data in real time
    2. A windmill generates real-time data of rotational speed and energy production
    3. A cpu monitor produces temperature, workload, and utilization data
  • A time series data set has these characteristics
    1. New data are stored and recorded as entries
    2. Data are stored in chronological order
    3. All data are timestamped

https://dzone.com/articles/time-series-database-vs-common-database-technologi

16
  • 181U
  • See acknowledgements

Time Series Databases

Your browser does not support the audio element.
  • A time series database (TSDB) is a database optimized for time-stamped or time series data.
  • Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time.
  • Features of timeseries databases
    1. Create, Read, update, manage time-value pairs
    2. Calculate, interpolate, and filter time-value pairs.
  • Example: With a time series database, it is common to request a summary of data over a large time period.
    • This requires going over a range of data points to perform some computation like a percentile increase this month of a metric over the same period in the last six months, summarized by month.

https://www.influxdata.com/time-series-database/ https://en.wikipedia.org/wiki/Time_series_database

17
  • 181U
  • See acknowledgements

Example Time Series Data

Your browser does not support the audio element.

Time Series White Paper from InfluxDB

18
  • 181U
  • See acknowledgements

Some Leading Time Series databases

Your browser does not support the audio element.
  • InfluxDB
  • Graphite
  • Prometheus
19
  • 181U
  • See acknowledgements

InfluxDB Data Model

Your browser does not support the audio element.
  • The InfluxData stack organizes time series in a structured format.
  • At the top level is a measurement name,
    • followed by a set of key/value pairs called tags that describe the metadata,
    • followed by key/value pairs of the actual values called fields.
    • Finally, there is a timestamp for the set of values.

InfluxDB has a line protocol for sending time series data

<measurement name>,<tag set> <field set> <timestamp>

For example CPU metrics

cpu,host=serverA,region=uswest idle=23,user=42,system=12 1549063516
20
  • 181U
  • See acknowledgements

Visualization is a key application for Time Series data

Your browser does not support the audio element.

21
  • 181U
  • See acknowledgements

InfluxDB’s web app is Chronograf

Your browser does not support the audio element.

22
  • 181U
  • See acknowledgements

Grafana is another open source visualization tool

Your browser does not support the audio element.

https://play.grafana.org/d/000000012/grafana-play-home?from=1581975730832&to=1581976882123

23
  • 181U
  • See acknowledgements

Grafana

Your browser does not support the audio element.
  • Broad support for various databases
    • graphite
    • influxdb
    • prometheus
    • AWS cloudwatch
    • mysql
    • Amazon Timestream
    • …
  • Graph plugins
    • Graph
    • Heatmap
    • Piechart
    • Radar graphs
    • …
24
  • 181U
  • See acknowledgements

Device Shadows (AWS)

Your browser does not support the audio element.
  • Device shadows – For low frequency events such as configuration or status

    • provide most recent information if the network is partitioned
    • allows system to contiue if there is an intermittent failur
  • At the implementation level, a shadow service uses a JSON document to hold data and metadata associated with each device

  • Uses MQTT topics to facilitate communication between applications and devices

25
  • 181U
  • See acknowledgements

Shadow Example

Your browser does not support the audio element.
{
"state": {
"desired": {
"color": "RED",
"state": "STOP"
},
"reported": {
"color": "GREEN",
"engine": "ON"
},
"delta": {
"color": "RED",
"state": "STOP"
}
},
"metadata": {
"desired": {
"color": {
"timestamp": 12345
},
"state": {
"timestamp": 12345
},
"reported": {
"color": {
"timestamp": 12345
},
"engine": {
"timestamp": 12345
}
},
"delta": {
"color": {
"timestamp": 12345
},
"state": {
"timestamp": 12345
}
}
},
"version": 17,
"timestamp": 123456789
}
}
26
  • 181U
  • See acknowledgements

Azure Device Twin

Your browser does not support the audio element.

  • A device twin contains desired properties, reported properties, and tags.
    • A desired property is set by a back-end application and read by a device.
    • A reported property is set by a device and read by a back-end application.
    • A tag is set by a back-end application and is never sent to a device
27
  • 181U
  • See acknowledgements

Summary

Your browser does not support the audio element.
  • Types of Data
  • Data Frequency
  • Relational Databases + SQL
  • Time-series Databases
    • Visualization
  • Shadows/Twins/Caching
28
  • 181U
  • See acknowledgements

Agenda

Your browser does not support the audio element.
  • Types of Data
  • Data Characteristics
  • Relational Data & SQL
  • NoSQL
  • RealTime databases
  • Digital Shadows (Caching)
  • Data Visualization for IoT
2
Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k Go to previous slide
↓, →, Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow