Statistics and Data Techniques – Hire Academic Expert

Data Driven Decisions
for Business
Statistics and Data Techniques (2)
– More advanced statistical concepts, methods and tools to
help you address business problems using data

Learning outcomes are largely the same as last week

Practise Practise using some more Excel statistical and related functions

 

Understand some more statistical concepts used to analyse data
and how they relate to solving business problems
Understand

 

Appreciate how statistical methods can help you answer real
world business problems
Appreciate

Session
roadmap
• Recap on last week and review of your Apply activity
• Activity 1: More Excel practice
• More statistical concepts
• Applying statistics in business
• Advanced methods
• Activity 2: A real-life business problem. Helping out your data
analysts
• Your Consolidation homework
• Key takeaways and Q&A

Recap
• Statistical concepts: Measures of centrality and dispersion, the CLT,
normal distribution and sampling
• Basic Excel functions
• Exercises: Using Excel to conduct Exploratory data analysis (EDA) on
customer website data
Qualitative analysis
• Project staging
• Qualitative analysis versus quantitative analysis
• Weaknesses and risks
• Data formats
• Public and private APIs
You should now be drafting your Assignment Task 3
Review of your Apply activity
If you have not yet posted to the Hub, do
so at the break.

Plotting our course in the analytics project lifecycle
Business case
development
Data aggregation &
representation
Data validation &
cleansing
Data extraction &
modelling
Data identification
and feasibility
analysis
Source data
acquisition & filtering
Data analysis Data visualisation Actionable results and execution
Stage 1 Stage 2 Stage 3
Stage 6 Stage 5 Stage 4
Stage 7 Stage 8 Stage 9

Activity 1 (Individual): Using Excel to
manipulate data
1. Open the Activity 2 exercises file
2. Work through each of the problems (one per tab):
• xlookup*
• Conditional formatting
• Pivot tables
• Transposing data
• Text-to-columns
• Identifying and fixing data errors.
3. Let your tutor know when you have completed each task
30 minutes
*xlookup is Microsoft’s update on the traditional vlookup function
Activity 1 Wrap-up
• Being able to use pivot tables to summarise data is a valuable skill
generally and is essential for your assignment
• Use conditional formatting in your assignment to help identify
suspect data as part of the data cleansing phase
• Use xlookup, for example, to convert month reference(1-12) to real
month tags (Jan-Dec)

More statistical concepts 1
– A stating toolkit
1. Hypotheses. Set-up and testing
2. Correlation and regression
We will cover these concepts in outline only. As managers, your role as
customers of this analysis will be to interpret, challenge and act on the results.
Statistics deals with
(optimising) decision making
under uncertainty, i.e. with
incomplete information –
which is always true. Do
you agree or not?

More statistical concepts 2
– What is a hypothesis?
A hypothesis is a mathematically testable, formally written
version of a hunch, an intuition, a guess or an assertion
OK, so give me some examples of what we might want to test
Has Florence’s class
performed better than
Albert’s?
Will our preferred party win
the election?
Is the new ecommerce site
better than the old?
Does our coffee contain the
right specified amount of
caffeine?
Is classroom attendance
linked to exam success?
How does social media
contribute to mental health
problems in teenagers?

As quality inspectors we want to test whether Matilda’s claim is true.
Does her coffee contain the right specified amount of caffeine?
Firstly, we need to take a sample, say 50 cups
Matilda advertises that the coffee in her
café contains a robust 330 mg of caffeine
We find that the average of this sample is 331 mg
Can we come to a conclusion about whether Matilda’s
coffee is up to scratch?
Yes we
can!
Why can’t we test
every cup?
More statistical concepts 3
– Setting up the null and alternative hypotheses

Applying statistical methods in business 1
– An alphabet soup of toolkit terms. Here are just a few:
We will look at a real-life
business problem that
uses one these
Chi
2 Tests
t-Tests
Z-Tests
As business managers you do not need to understand the inner workings of these black-box
maths toolkit approaches. You just need to understand the business problems you are
aiming to solve and how different tools can help. Specialists do the rest.
ANOVA
ANOVA
Kruskal-Wallis Test
Mann-Whitney U-test
Excel itself provides some of these.
Specialist stats packages like R
SPSS and MATLAB provide more.
One-tail tests
Two-tail tests
F-Tests
Pearson’s correlation
coefficient (r)
Coefficient of
determination (r
2)
Spearman’s Rho
Signal-to-noise ratio
(MST/MSE)
Bayes
Advanced methods 2
– There are two overarching types of statistical test one wants to perform
What do you think they are?
COMPARISON
– Is there a difference
between groups or
categories?
RELATIONSHIP
– Is there a link
between data sets or
categories of data?
Examples? Examples?
Applying statistical methods in business 2
Correlation and Regression 1
What can we
infer from this
data and
chart?

Some initial
observations
Regression line
Applying statistical methods in business 3
– Correlation and regression 2

Applying statistical methods in business 4
– Getting Excel to do the heavy lifting
Excel stats will
tell you a
whole lot more
There is a lot here, key points:
Correlation coefficient (r ) = 0.75
Coefficient of determination (r
2)= 0.56
The regression equation is:
TS = 20 + 0.71SE
Why is this useful?
Advanced methods 1
– A whistle-stop tour
Simple one-factor regression models
often work surprisingly well in
predicting future outcomes
Source: Noise – a flaw in human judgement (Kahneman et al, 2021)
Simple (linear) models have been
shown in many instances to be better
predictors then human ‘experts’
(Why do you think this is?)
But in practice there are many
independent and dependent
variables to consider
For example, what variables might
you use to predict the success of a
new coffee shop?
So, often models with lots of inputs
don’t improve prediction much,
over simpler models
These variables in turn often
correlate with each other
But neither simple or complex
models can account for the ‘broken
leg’ outlier problem. This is where
human ‘inside information’ is
important
In many situations AI and machine learning deliver demonstrably better results
than simple or complex rules-based models and human judgement. And they
can spot the broken-leg outliers. But they need a lot of data to work. And
remember that all models are imperfect (!)

Activity 2 (group): Helping your data analysts 1
– The scenario
30 minutes
You are a product manager at WitterNook, a well-known social media platform. As part of this, you work
with a team of (quite geeky) data analysts who analyse user engagement on behalf of the various product
teams. The trouble is that, whilst they are great at data, they are not very good at communicating and
turning data into meaning (that is your job).
As product manager you have been testing some new advertising approaches. Your analysts have been
running some data research on user response rates and have come up with initial ‘results’.
You need to make sense of the results and present findings to the Board tomorrow. The key stakeholders
on the Board are the Director of Innovation (it was her idea), Sales Director (he has to sell it to users) and
the Finance Director (she has to sign-off on the funding and ensure a return on investment).

Activity 2 (group): Helping you data analysts 2
– Board reporting tasks
30 minutes
All three teams: Open the Activity2 Research xls. Follow the guidelines in
the ‘Instructions’ tab.
Team red: What do the results tell us about the correlation between age and
time spent online?
Team amber: Review the dataset and recommend five further investigations
that would be useful to conduct.
Team green: List and describe three weaknesses with this kind of analysis
and suggest improvements.

Activity 2 Wrap-up
Correlation analysis Age Time spent online
Age 1
Time spent online -0.705021685 1
Charting
Linear regression using ANOVA
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.705021685
R Square 0.497055577
Adjusted R Square 0.491923491
Standard Error 8.046277102
Observations 100
ANOVA
df SS
Regression 1 6270.483065
Residual 98 6344.772369
Total 99 12615.25543
MS F Significance F
6270.483 96.85254 2.67427E-16
Coefficients Standard Error 64.74258
Intercept 92.60560964 5.146478149
Age -1.572635638 0.159798463
t Stat P-value Lower 95% Upper 95%Lower 95.0% Upper 95.0%
17.99398 7.98E-33 82.39259125 102.8186 82.39259 102.8186
-9.84137 2.67E-16 -1.889750487 -1.25552 -1.88975 -1.25552
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45
Age (years)
Time online (minutes)
Time spent online versus age

Also remember to complete this week’s MCQ
(this is monitored)
Your Consolidate Activity
– Continue to develop Task Three of your Assessment

Key takeaways and Q&A
By completing this session you should:
• Have practised using some key Excel
functions including xlookup, pivot tables and
conditional formatting
• Understand the value of setting up testable
business hypotheses
• Have an appreciation as to how to interpret a
correlation and regression analysis

Comments are closed.