# Basic concepts, techniques and tools – Hire Academic Expert

Data Driven Decisions

for Business

Statistics and data techniques (1)

– Basic concepts, techniques and tools you need to be

aware of as a business manager

Learning outcomes

Understand key statistical concepts and functions used to analyse data |
Understand |

Evaluate the differences between qualitative and quantitative data analysis |
Evaluate |

Appreciate the common formats used in modern data and applications |
Appreciate |

Know | Know some basic Excel statistical and related functions |

Session roadmap

1. Recap on last week and review of your Apply

activity

2. Key statistical concepts and functions

3. Activities 1 and 2: Using Excel to investigate data

4. Qualitative analysis – adding structure to

unstructured data

5. Data formats and APIs

6. Your Consolidation homework

7. Key takeaways and Q&A

Recap

• Typical data collection and data quality issues

• Analytics infrastructure components and cloud computing

• DIKW – speeding and deepening the data-to-update-to-action-tocontrol cycle

• Typical data sources and users

• Classifying data

• Application: Data sources users, uses and stakeholders in a till-less

supermarket

• Application: Use of data for different business projects in an oil

company

• You should now have drafted your Assignment Task 2

Review of your Apply activity

If you have not yet posted to the Hub, do

so at the break.

Plotting our course in the analytics project lifecycle

Business case

development

Data aggregation &

representation

Data validation &

cleansing

Data extraction &

modelling

Data identification

and feasibility

analysis

Source data

acquisition & filtering

Data analysis Data visualisation Actionable results and execution

Stage 1 Stage 2 Stage 3

Stage 6 Stage 5 Stage 4

Stage 7 Stage 8 Stage 9

Key statistical concepts

1.Measures of centrality of a data set

2.Measures of spread or dispersion of a data set

3.The central limit theorem

4.The normal distribution

5.Sampling

Statistics, definition:

The practice or science of collecting and

analysing numerical data in large

quantities, especially for the purpose of

inferring proportions in a whole from

those in a representative sample.

What is statistics? What do you understand by

the term ‘Statistical analysis’?

List and explain some key concepts and

statistical measures

Measuring data centrality

What is the ‘centre of gravity’ of the

data?

• Mean (aka average)

• Median

• Mode

Data spread or dispersion

– Your data could be distributed in all sorts of weird ways

But there is good

news…the central limit

theorem

Interesting aside – AI systems

spend a lot of time and computing

effort working out the distribution of

historical data – and the predicted

distribution of future data

The Central Limit Theorem (CLT)

– A fantastic rule

The CLT states that the distribution of a sample variable approximates a normal distribution

(i.e. a ‘bell curve’) as the sample size become larger, regardless of the distribution.

This is important because it means that you can always calculate

mean and standard deviation for your data, regardless of the

underlying shape of the underlying distribution.

Measuring data spread

– The Normal distribution and standard deviation

Source: Wikipedia

Note that the Normal

distribution is one of

many statistical

distributions and

comes with many

assumptions

Two primary

measures of spread:

Standard Deviation

and Range

Sampling

• Why sample?

• Advantages?

• Disadvantages?

• Availability bias?

• Collector-bias?

Some sampling approaches:

• Simple random sampling

• Systematic random sampling

• Stratified random sampling (aka segmentation sampling)

• Quota sampling

• Availability sampling

Source: Wikipedia (Creative Commons)

Basic Excel functions

– SUM

– AVERAGE

– COUNT, COUNTA, COUNTBLANK

– ROUND

– TRIM (and other text cleaning functions)

– VLOOKUP (and the newer XLOOKUP function)

– CONDITIONAL FORMATTING

– PIVOT TABLES

What Excel functions do you already know?

List some examples

NB: Absolute versus

relative cell referencing

Other useful functions

– IF – logical test, value if true, value if false

– COUNTIF, COUNTIFS – count if certain conditions are met

– POWER – compound interest example

– SORT – sorting data by value or alphabetically

– FILTER – filtering data by one or more criteria

Activity 1 (Individual): Using Excel

to investigate data

1. Open Excel 365

2. Go to the hub and download ‘Activity dataset’

3. Calculate the following:

• Average customer age

• Age range, oldest, youngest customer

• Total amount purchased

• Average amount purchased, standard deviation

Extension: Which appears to be the better web site? A or B? Why?

20 minutes

Exploratory Data

Analysis (EDA Part 1)

Activity 1 wrap-up: The power and value of Excel

• Being able to run simple stats on data is really really useful!

• We often call this ‘Exploratory Data Analysis’ (EDA)

• Understand why you are doing the calculations – what does it inform

you and others

• Averages, ranges, spreads

• Already you will have used some useful Excel functions: Sum,

average, count, counta, countif (as a challenge, try using the

countifs function)

Remember: You have access to a good Excel tutorial guide

Available online from the library and there are paper copies in the

Shepherd’s Bush campus library (or from Amazon for £25!)

Qualitative analysis – staging

– Adding structure and knowledge to largely unstructured data

Collect Raw Data

• Interviews

• Surveys

• Observation

Describe Data

Group Themes,

Identify Patterns

& Interpret

Results

Reporting &

Communication

Analyse Data

• Experts

• Literature Reviews

• Theoretical Models

Define the

(business)

objective

Collect raw data

Categorise data

Analyse data

• EDA Statistics

• Experts

• Literature Reviews

• Theoretical Models

Group themes,

identify patterns &

interpret results

Reporting &

Sample applications: Market research, political polling, psychological

research, social science, behavioural science

Set up the

initial

hypothesis or

hypotheses

Qualitative versus qualitative analysis 1

– what is the difference?

Quantitative analysis

Deals explicitly with data

that exists as numbers in

the first instance.

Qualitative analysis

converts non-numeric data

into numbers so it can be

manipulated and analysed

statistically.

Vs.

Team A: Provide examples of

qualitative research

Team B: Provide examples of

quantitative research

Qualitative versus qualitative analysis 2

Quantitative data

• Already exists numerically e.g. financial accounts

• Can often be automatically and continuously collected

(PoS, sensors, APIs)

• ‘Easy’ to analyse and chart

• If continuous (e.g. age, height, sale amount) will need to

be ‘bucketed’

• ‘Easy’ to establish patterns, correlations

• ‘Easy’ to build regression prediction models

• ‘Easy’ to apply in for example physics or quantitative

finance

Qualitative data

• Lacks initial structure

• Structure needs to be added

• Needs to be categorised

• May need to be categorised into ordinal ‘buckets’

• Not all ordinal ranges are the same

• Subject to bias and opinion

• Results might be influenced by the questions asked (!)

• Results might be influenced by the questioner?

Key questions to ask as a customer or user of qualitative analysis: Who collected the data? Have they an open or

hidden agenda? How was the data collected? How was the data categorised? Are the ordinal categories meaningfully

defined and sensible? Have the results been repeated by others? Is the analysis itself repeatable? Are the categories

(buckets) if equal size?

In essence, how robust and credible are the numerical results and the resulting inferences?

A note on structured

and unstructured data

Structured

Semi-structured

• Typical conforms to a data model / schema and is used to capture relationships

and attributes.

• Structured method of storage – typically held in a relational data base. E.g.

PostgreSQL, MySQL

• Standardised tools and database types are widely used

Unstructured

• Often does not conform to a standard data type or model, with inputs varying

widely. These data types might include images, text files, OCR very large

complex data sets such as genome mapping

• Special Purpose Logic or Pattern Recognition Algorithms required to process /

query the data

• Defined level of structure and consistency, but not relational

• XML, JSON – standardised data formats designed to be easy to read and write.

It is common to store text using these formats.

Key conceptual point: Except in pure

mathematics (and the definition of pure noise),

there is no such thing as ‘wholly unstructured’

data. It is a question of degree.

Data Formats

Possible Data Formats:

• XLS – Excel spreadsheet

• CSV – Comma separated value

• TXT – Unformatted text – or words

• XML – Extensible Markup Language which stores in a plain

format

• JSON – JavaScript Object Notation is an open standard file

format

• PDF – portable document format

• GIF – Graphics Interchange Format – an image file format

• JPEG – Joint Photographic Experts Group – a standard image

format

Data access via data

download or API

Private and public APIs

Source: Wikipedia

Activity 2 (Individual): Using Excel

to investigate data

1. Open Excel 365

2. Go to the hub and download ‘Activity 1 dataset’ again.

3. Calculate the following:

• How many added to basket, did not add to basket

• Percentage added to basket? For A? For B?

• What percentage of site visitors made a purchase? For A? For B?

• Which is the largest country market?

Extension: Which appears to be the better web site? A or B? Why?

20 minutes

Exploratory Data

Analysis (EDA Part 2)

Activity 2 wrap-up: Developing your analysis

– To think about

Do you think there is a statistically significant

difference between the performance of the two

websites A and B?

Also remember to complete this week’s MCQ

(this is monitored)

Your Consolidate Activity – Develop Task Three of your Assessment

Key takeaways and Q&A

By completing this session you should:

• Understand the use of key statistical functions

in order to explore and describe data

• Have and overview of different data formats

used in acquiring data

• Have practiced using some key Excel

functions including