Basic concepts, techniques and tools – Hire Academic Expert

Data Driven Decisions
for Business
Statistics and data techniques (1)
– Basic concepts, techniques and tools you need to be
aware of as a business manager

Learning outcomes

Understand key statistical concepts and functions used to
analyse data
Understand

 

Evaluate the differences between qualitative and quantitative
data analysis
Evaluate

 

Appreciate the common formats used in modern data and
applications
Appreciate

 

Know Know some basic Excel statistical and related functions

Session roadmap
1. Recap on last week and review of your Apply
activity
2. Key statistical concepts and functions
3. Activities 1 and 2: Using Excel to investigate data
4. Qualitative analysis – adding structure to
unstructured data
5. Data formats and APIs
6. Your Consolidation homework
7. Key takeaways and Q&A

Recap
• Typical data collection and data quality issues
• Analytics infrastructure components and cloud computing
• DIKW –
speeding and deepening the data-to-update-to-action-tocontrol cycle
• Typical data sources and users
• Classifying data
• Application: Data sources users, uses and stakeholders in a till-less
supermarket
• Application: Use of data for different business projects in an oil
company
You should now have drafted your Assignment Task 2
Review of your Apply activity
If you have not yet posted to the Hub, do
so at the break.

Plotting our course in the analytics project lifecycle
Business case
development
Data aggregation &
representation
Data validation &
cleansing
Data extraction &
modelling
Data identification
and feasibility
analysis
Source data
acquisition & filtering
Data analysis Data visualisation Actionable results and execution
Stage 1 Stage 2 Stage 3
Stage 6 Stage 5 Stage 4
Stage 7 Stage 8 Stage 9

Key statistical concepts
1.Measures of centrality of a data set
2.Measures of spread or dispersion of a data set
3.The central limit theorem
4.The normal distribution
5.Sampling
Statistics, definition:
The practice or science of collecting and
analysing numerical data in large
quantities, especially for the purpose of
inferring proportions in a whole from
those in a representative sample.
What is statistics? What do you understand by
the term ‘Statistical analysis’?
List and explain some key concepts and
statistical measures

Measuring data centrality
What is the ‘centre of gravity’ of the
data?
• Mean (aka average)
• Median
• Mode

Data spread or dispersion
– Your data could be distributed in all sorts of weird ways
But there is good
news…the central limit
theorem
Interesting aside – AI systems
spend a lot of time and computing
effort working out the distribution of
historical data – and the predicted
distribution of future data

The Central Limit Theorem (CLT)
– A fantastic rule
The CLT states that the distribution of a sample variable approximates a normal distribution
(i.e. a ‘bell curve’) as the sample size become larger, regardless of the distribution.
This is important because it means that you can always calculate
mean and standard deviation for your data, regardless of the
underlying shape of the underlying distribution.

Measuring data spread
– The Normal distribution and standard deviation
Source: Wikipedia
Note that the Normal
distribution is one of
many statistical
distributions and
comes with many
assumptions
Two primary
measures of spread:
Standard Deviation
and Range

Sampling
• Why sample?
• Advantages?
• Disadvantages?
• Availability bias?
• Collector-bias?
Some sampling approaches:
• Simple random sampling
• Systematic random sampling
• Stratified random sampling (aka segmentation sampling)
• Quota sampling
• Availability sampling
Source: Wikipedia (Creative Commons)
Basic Excel functions
– SUM
– AVERAGE
– COUNT, COUNTA, COUNTBLANK
– ROUND
– TRIM (and other text cleaning functions)
– VLOOKUP (and the newer XLOOKUP function)
– CONDITIONAL FORMATTING
– PIVOT TABLES
What Excel functions do you already know?
List some examples
NB: Absolute versus
relative cell referencing

Other useful functions
– IF – logical test, value if true, value if false
– COUNTIF, COUNTIFS – count if certain conditions are met
– POWER – compound interest example
– SORT – sorting data by value or alphabetically
– FILTER – filtering data by one or more criteria

Activity 1 (Individual): Using Excel
to investigate data
1. Open Excel 365
2. Go to the hub and download ‘Activity dataset’
3. Calculate the following:
• Average customer age
• Age range, oldest, youngest customer
• Total amount purchased
• Average amount purchased, standard deviation
Extension: Which appears to be the better web site? A or B? Why?
20 minutes
Exploratory Data
Analysis (EDA Part 1)

Activity 1 wrap-up: The power and value of Excel
• Being able to run simple stats on data is really really useful!
• We often call this ‘Exploratory Data Analysis’ (EDA)
• Understand why you are doing the calculations – what does it inform
you and others
• Averages, ranges, spreads
• Already you will have used some useful Excel functions: Sum,
average, count, counta, countif (as a challenge, try using the
countifs function)

Remember: You have access to a good Excel tutorial guide
Available online from the library and there are paper copies in the
Shepherd’s Bush campus library (or from Amazon for £25!)

Qualitative analysis – staging
– Adding structure and knowledge to largely unstructured data
Collect Raw Data
• Interviews
• Surveys
• Observation
Describe Data
Group Themes,
Identify Patterns
& Interpret
Results
Reporting &
Communication
Analyse Data
• Experts
• Literature Reviews
• Theoretical Models
Define the
(business)
objective
Collect raw data
Categorise data
Analyse data
• EDA Statistics
• Experts
• Literature Reviews
• Theoretical Models
Group themes,
identify patterns &
interpret results
Reporting &
Sample applications: Market research, political polling, psychological
research, social science, behavioural science
Set up the
initial
hypothesis or
hypotheses

Qualitative versus qualitative analysis 1
– what is the difference?
Quantitative analysis
Deals explicitly with data
that exists as numbers in
the first instance.
Qualitative analysis
converts non-numeric data
into numbers so it can be
manipulated and analysed
statistically.
Vs.
Team A: Provide examples of
qualitative research
Team B: Provide examples of
quantitative research

Qualitative versus qualitative analysis 2
Quantitative data
• Already exists numerically e.g. financial accounts
• Can often be automatically and continuously collected
(PoS, sensors, APIs)
• ‘Easy’ to analyse and chart
• If continuous (e.g. age, height, sale amount) will need to
be ‘bucketed’
• ‘Easy’ to establish patterns, correlations
• ‘Easy’ to build regression prediction models
• ‘Easy’ to apply in for example physics or quantitative
finance
Qualitative data
• Lacks initial structure
• Structure needs to be added
• Needs to be categorised
• May need to be categorised into ordinal ‘buckets’
• Not all ordinal ranges are the same
• Subject to bias and opinion
• Results might be influenced by the questions asked (!)
• Results might be influenced by the questioner?
Key questions to ask as a customer or user of qualitative analysis: Who collected the data? Have they an open or
hidden agenda? How was the data collected? How was the data categorised? Are the ordinal categories meaningfully
defined and sensible? Have the results been repeated by others? Is the analysis itself repeatable? Are the categories
(buckets) if equal size?
In essence, how robust and credible are the numerical results and the resulting inferences?
A note on structured
and unstructured data
Structured
Semi-structured
• Typical conforms to a data model / schema and is used to capture relationships
and attributes.
• Structured method of storage – typically held in a relational data base. E.g.
PostgreSQL, MySQL
• Standardised tools and database types are widely used
Unstructured
• Often does not conform to a standard data type or model, with inputs varying
widely. These data types might include images, text files, OCR very large
complex data sets such as genome mapping
• Special Purpose Logic or Pattern Recognition Algorithms required to process /
query the data
• Defined level of structure and consistency, but not relational
• XML, JSON – standardised data formats designed to be easy to read and write.
It is common to store text using these formats.
Key conceptual point: Except in pure
mathematics (and the definition of pure noise),
there is no such thing as ‘wholly unstructured’
data. It is a question of degree.

Data Formats
Possible Data Formats:
• XLS – Excel spreadsheet
• CSV – Comma separated value
• TXT – Unformatted text – or words
• XML – Extensible Markup Language which stores in a plain
format
• JSON – JavaScript Object Notation is an open standard file
format
• PDF – portable document format
• GIF – Graphics Interchange Format – an image file format
• JPEG – Joint Photographic Experts Group – a standard image
format
Data access via data
download or API

Private and public APIs
Source: Wikipedia
Activity 2 (Individual): Using Excel
to investigate data
1. Open Excel 365
2. Go to the hub and download ‘Activity 1 dataset’ again.
3. Calculate the following:
• How many added to basket, did not add to basket
• Percentage added to basket? For A? For B?
• What percentage of site visitors made a purchase? For A? For B?
• Which is the largest country market?
Extension: Which appears to be the better web site? A or B? Why?
20 minutes
Exploratory Data
Analysis (EDA Part 2)

Activity 2 wrap-up: Developing your analysis
– To think about
Do you think there is a statistically significant
difference between the performance of the two
websites A and B?

Also remember to complete this week’s MCQ
(this is monitored)
Your Consolidate Activity – Develop Task Three of your Assessment
Key takeaways and Q&A
By completing this session you should:
• Understand the use of key statistical functions
in order to explore and describe data
• Have and overview of different data formats
used in acquiring data
• Have practiced using some key Excel
functions including