Data Driven Decisions – Hire Academic Expert

Data Driven Decisions
for Business
Data sources and users
Learning outcomes
for today

Evaluate common sources and users of data (financial and operational) in key
organisational functions
Evaluate
Understand Understand how to characterise data according to the 5Vs of big data

 

Appreciate Appreciate and respond to different stakeholder and departmental needs

 

Evaluate Evaluate the role of financial and non-financial information to inform decision making

 

Consider Consider how data impacts on different organisational functions

Roadmap for today
• Recap of last week and review of your Apply activity
• Common data collection and quality issues
• Overview: Data sources and users
• Activity 1 (Individual): Using data to inform business and
stakeholder decision-making
• The five Vs of big data
• Activity 2 (Group): Using the 5Vs to characterise and
understand data
• Your Consolidation homework for today
• Key takeaways, Q&A and next steps

Recap – key points
• Driving factors for DDDB adoption
• Data analytics project frameworks
• DDDB and the five Performance Objectives
• KPIs
• Tesco and university operations case studies

Recap – frameworks
Business case
development
Data aggregation &
representation
Data validation &
cleansing
Data extraction &
modelling
Data identification
and feasibility
analysis
Source data
acquisition & filtering
Data analysis Data visualisation Actionable results
and execution
Stage 1 Stage 2 Stage 3
Stage 6 Stage 5 Stage 4
Stage 7 Stage 8 Stage 9

Review of your Apply activity
If you have not yet posted to the Hub, do
so at the break.

Common data collection issues
API interfacing
Expense
(time, cost)
Integration of disparate
source data
(REFERENCE DATA)
Storage management
Governance
Availability bias
Common data quality issues
• Data definitions: What is a day? A week? A month?
• Calendar year or financial year?
• What profit profitability measures to use?
• What does ‘0’ and ‘N/A’ mean in a data file?
• Is data consistent across time?
• Incomplete data records (e.g. partially completed survey responses)?
• Inbuilt data bias
• Sampled data – systematic bias?
• Sampled data – random errors, noise and sample size
• What to do with outliers
• Fraudulent data?
This Photo by Unknown
Author is licensed under
CC BY-SA
Overview: Data sources and users
– Infrastructure components
Unstructured
data
Data summaries
Data
warehouse
Metadata
Operational
system
Operational
system
Operational
system
External data
sources
Specialist
query app
Specialist
query app
Specialist
query app
Ad-hoc, eg
spreadsheets
+ specialist
recruitment
+ training
+ infrastructure
CAPEX
API
Watch the following video
What are the benefits?
Overview: Data sources and users
– Cloud-based computing as a key infrastructure
development
https://youtu.be/1ERdeg8Sfv4
Overview: Data sources and users
– The DIKW* pyramid
Wisdom
Knowledge
Information
Data
Linkages
Elements
Application (People, Experience,
Expertise, Institutional Memory)
Prediction and
decision making
Foresight
Insight
Hindsight
Contextualisation
Action &
control
*Data, information, knowledge, wisdom. Source: Ackoff (1989)
Wisdom
Knowledge
Information
Data
Overview: Data sources and users
– Applying DIKW the business context: Minimising the data-to-update-to-action-to-control cycle
Events
(new data)
Operational
(Metrics)
Tactical
(KPIs)
Strategic
(Options
Judgement
(Constraints)
Action(s)
See for example DIKW (Ackoff, 1989), OODA (Boyd, 1987), PDCA (Deming, 1959)
Updates Control
Overview: Data sources and users
– Identifying and classifying data sources
Data is literally everywhere and is
being continuously created,
distributed, linked copied and
manipulated
Transaction
and event data
Social media
data
Geospatial
data
Sensor data
image, video,
sound plus
many other
data types
‘Individual’
data (e.g.
physical
biometrics)
Human
science and
knowledge
data
Environmental
and natural
world data
data
Here is just one classification of
data sources

Overview: Data sources and users
– Classifying data
Data
Quantitative
(a specific number)
Qualitative
(Category data)
Continuous Discrete
• Colour
• Shareholder Voting record
• Student hometown
Nominal
(categorical data)
Ordinal
(ranked data)
• Customer
ranking e.g. poor
to excellent
• Growth
• Profitability
• Money (turnover,
costs, profit)
• Shoe size
A basic
classification
In order to derive
value from data you
must understand
and classify it!
Source or
observation Variables

Overview: Data sources and users
– Classifying data – sprint exercise
What kinds of data are these from a market research exercise
by a streaming media company? How to display? (Random
sample of 250 responses)
Variable (Per observation or response)
• Age?
• Sex?
• Annual income?
• Number of films watch per month?
• Preferred film types?
• Satisfaction with service?
Data type
• Qualitative – nominal
• Qualitative – nominal
• Quantitative continuous
• Quantitative – discrete
• Qualitative – nominal
• Qualitative – ordinal
Chart options (examples)
• Pie or bar
• Pie or bar
• Histogram
• Bar or pie
• Pie or bar
• Bar

Overview: Data sources and users
– Typical stakeholders – consumers of data
Some typical
stakeholders
Customers
Directors &
managers
Staff
Interest
groups
Potential
Shareholders recruits
Government
Regulators
The Media

Sprint exercise – data examples
Give some quick examples of the following types of data you
might see in a business context
1. Financial and non-financial data
2. Quantitative and qualitative data
3. Public and non-public data
4. Structured and unstructured data
Note: Qualitative
data can be
converted to quasiquantitative data by
applying ordinal or
categorical
categorisation
Note: There is no
such thing as
completely
unstructured data
(except in pure
mathematics!)

Some data examples
Financial data
• P&L, balance sheet and cashflow
• Average order size
• Amounts owing to individual suppliers
• Amounts owed by individual customers
Non-financial data
• Customer feedback ratings
• Factory error rates
Structured data
• Accounting data
Public data
• Stock price
• Trip-advisor reviews
Quantitative data
• Anything money-related
• Order volumes
• Return and reject numbers
Unstructured data
• Twitter commentary on products and services
• Video camera security data
Qualitative data
• Customer feedback (again)
Non-public data
• Quarterly earnings
• HR data

Activity 1 (Individual):
Using data to inform business and stakeholder
decision making
Scenario: You are a newly appointed store manager at an Amazon Fresh store.
You know that Amazon is a data driven organisation and it was your discussion
about this at interview that got you the job.
1. List and discuss three different sets of data you would want to review on a
weekly basis. What are the sources? What could be the data collection and
quality issues?
2. List and explain three ways that this data could inform you in running the store
3. List three store stakeholders besides yourself and explain what data they would
want relating to the store and how they might use the data themselves.
30 minutes

Activity 1 wrap-up
Some data sources:
• Financial data (centrally managed)
• Customer demographics (via their Amazon
profile)
• Staff data (punctuality, hours)
• Security camera data
• Sales volume data by item, line and
category
• Returns data
• Stock reconciliation data
• Out-of-date stock metrics
• Social media data
Some examples of how the data helps you
run your business:
• Stock ordering (but probably automated!)
• Promotions
• Shelf layouts
• Delivery scheduling
• Staff management
• Staff resource scheduling
• Whether to spend less or more on security
• Scheduling of maintenance (cameras,
sensors)
Examples of other stakeholders: needs, interests, concerns:
• Regional management: Store performance and benchmarking
• Suppliers: Stock-line performance, promotions, forward ordering schedule
• Staff: Scheduling
• Local council and police: Level of shoplifting, security camera data
• Local press: e.g. store initiatives

Understanding your data
– The five Vs of big data
BIG
DATA
Volume
Velocity
Veracity Variety
Value
There are at least
two weaknesses of
this model. Can
you think of any?

Characterising data
– Questions to think about
Volume
• Real-time?
• Batched?
• Server capacity?
• Processing capacity?
Velocity
• Can our servers keep up?
• Are there peaks and troughs (e.g.
in the trading day)?
Variety
• Text, image, video?
• How indexed?
• How to add structure?
• Missing key sources?
• Missing key data fields?
• Lack of index integration?
• Analogue vs. digital data
• Does ‘unstructured’ data really exist?
• Financial versus non-financial data
Veracity
• Sample size?
• Bias (unintended and intended)
• Errors, duplicates
• Second-sourced?
• Reputation of the source?
• Data model and definitions?
• Single-data entry?
Value
• How to monetise?
• Who is it valuable to?
• Time-value decay?
Data governance criteria
• Confidentiality??
• Sensitivity and impact if released??
• Who impacted? Positively? negatively?
• Accuracy? Risks or errors?
• Provenance?
• Data stakeholders?
• Risk of releasing/not releasing?
• Likelihood and risk of data breaches?
• Existing security control features
• Machine of human generated?

Activity 2 (Group):
Using the 5Vs to characterise and understand data
Scenario (building on your apply activity): You work in the Business Intelligence Unit of a
multinational oil company. The company operates across the supply chain from upstream
exploration and drilling operations through to retail filling stations. You are responsible for
collecting and managing different types of data from internal and external sources around the
word. Your job today is to use the Five Vs model to assess and characterise data and to discuss
related data issues, using the data.
Work in three groups,
Red, Amber and Green:
1. Red: You are collecting financial data from three regions – USA, Europe and Africa – with the
aim of providing a single view of financial performance for input into statutory reporting
2.
Amber: You are collecting employment performance data relating to retail staff in filling
stations in order to help assess the performance of staff recruitment, management, training and
retention
3.
Green: You are collecting data from Facebook and Twitter. You have been asked to focus on
environmental campaigners and others in order to help assess perception of the company and
thus inform corporate communications.
Use the five Vs (volume, variability, velocity, veracity, value, + governance) to characterise your
data and explain
three important issues you might experience with the data.
30 minutes
Activity 2 Wrap-up – Financial data
5 Vs characterisation
Volume: Summary data, so will not be high
volume
Variability: Different charts of accounts,
different definitions, different accounting rules
will impact
Velocity: Low: Monthly or quarterly data
Veracity: Pretty high! (Algorithms can help
spot data irregularities and fraud at local level)
Value: This is key performance data
Governance: Highly sensitive (insider trading,
leaked press data)
Potential issues
• The different accounting conventions will
require clear and documented conversion
to provide a single like-for-like view
• The data is highly confidential so needs to
be under secure control
• Staff working on this data would need to
have vetting and clearance

Activity 2 Wrap-up – Retail staff data
5 Vs characterisation
Volume: Medium – one record per staff
member, maybe 20,000 records worldwide?
Variability: Different regions may have
different HR systems and thus different data
models. Local laws will also impact
Velocity: Low – it is likely that this will be just a
quarterly or annual exercise
Veracity: Hopefully high (!)
Value: This is important performance data
Governance: Highly sensitive and confidential
Potential issues
• Comparing like-for-like – only a partial
picture
• Comparing data across countries,
jurisdictions and cultures is hard
• The data is highly confidential so needs to
be under secure control

Activity 2 Wrap-up – Social media data data
5 Vs characterisation
Volume: High – could be hundreds of
thousands or updates a day
Variability: High: NGOs, Greenpeace,
individuals, robots,
Velocity: High. Real-time updating
Veracity: Highly variable. From accredited
sites to individuals/bots
Value: Not valuable data on its own
Governance: Collection will include personal
data profiles and tweets for example
Potential issues
• Text-analysis of social media data is hard
and error-prone. Sampling may be an
option
• There is sensitivity surrounding the
collection of social media data due
• Collecting external personal data so local
laws e.g. GDPR would need to be followed
and documented

Also remember to complete this week’s MCQ
(this is monitored)
Your Consolidate Activity – Develop Task Two of your Assessment
Takeaways and endsession discussion
Successful completion of this topic provides you with the
following valuable understanding and skills for a business
manager:
• A structure for how to characterise, understand and use
data
• An understanding of data users and how they might be
impacted by data
• The different types of data available