Consider your work environment, domain of interest, or the world around you, and discuss when it might be more appropriate to use the mean, median, or mode for measures of central tendency.I am currently in Human Resources/Recruiting

unitiv_study_guide.pdf

Unformatted Attachment Preview

UNIT IV STUDY GUIDE

Data Analysis: Descriptive Statistics

Course Learning Outcomes for Unit IV

Upon completion of this unit, students should be able to:

6. Differentiate between various research-based tools commonly used in businesses.

6.1 Describe various forms of descriptive statistics, including frequency distribution tables,

histograms, descriptive statistics tables, Kolmogorov-Smirnov tests, measurement scales, and

measures of central tendency.

7. Test data for a business research project.

7.1 Establish whether assumptions are met to use parametric statistical procedures by applying

descriptive statistics.

Course/Unit

Learning Outcomes

6.1

7.1

Learning Activity

Unit Lesson

Video: Kolmogorov-Smirnov Test of Normality in Excel

Video: Parametric and Nonparametric Statistical Tests

Video: Checking that Data Is Normally Distributed Using Excel

Video: 3. Choosing Between Parametric & Non-Parametric Tests

Article: “Difference Between Parametric and Nonparametric”

Article: “Deciphering the Dilemma of Parametric and Nonparametric Tests”

Unit IV Scholarly Activity

Unit Lesson

Unit IV Scholarly Activity

Reading Assignment

In order to access the following resources, click the links below:

Fields, H. (2018). Difference between parametric and nonparametric. Retrieved from

Dominguez, V. (2016, April 16). Make a histogram using Excel’s histogram tool in the Data Analysis ToolPak

[Video file]. Retrieved from https://www.youtube.com/watch?v=xekiDJzajYk

Click here for a transcript of the video.

Grande, T. (2017, August 19). Kolmogorov-Smirnov test of normality in Excel [Video file]. Retrieved from

Click here for a transcript of the video.

Grande, T. (2015, July 30). Parametric and nonparametric statistical tests [Video file]. Retrieved from

Click here for a transcript of the video.

MBA 5652, Research Methods

1

Macarty, M. (2015, September 21). Get descriptive statistics in Excel with Data

Analysis

Toolpak

[Video file].

UNIT

x STUDY

GUIDE

Retrieved from https://www.youtube.com/watch?v=h-RzBhBzJOQ

Title

Click here for a transcript of the video.

Oxford Academic (Oxford University Press). (2016, November 17). Checking that data is normally distributed

using Excel [Video file]. Retrieved from https://www.youtube.com/watch?v=EG8AF2B_dps

Click here for a transcript of the video.

Rana, R., Singhal, R., & Dua, P. (2016). Deciphering the dilemma of parametric and nonparametric tests.

Journal of the Practice of Cardiovascular Sciences, 2(2), 95. Retrieved from

http://link.galegroup.com.libraryresources.columbiasouthern.edu/apps/doc/A488649197/AONE?u=ora

n95108&sid=AONE&xid=c54eaf34

The Roslin Institute – Training. (2016, May 9). 3. Choosing between parametric & non-parametric tests [Video

file]. Retrieved from https://www.youtube.com/watch?v=_1mH6CnXKfM

Click here for a transcript of the video.

Unit Lesson

Data Analysis: Descriptive Statistics

The course is now entering the data analysis stage of research design. This is where the methodological fork

in the road goes decisively down the quantitative path. The first topic of discussion under data analysis will be

what is referred to as descriptive statistics. As the name suggests, the researcher describes the data that are

collected. During this stage, the data are described both visually and statistically. Data may be visually

displayed to reveal distribution of data, trends, anomalies, outliers, etc. Visual displays of data may take the

form of graphs, histograms, tables, plots, and other diagrams. This stage is done before any statistical

procedures are used to test the research hypotheses. This begs the question of why the researcher should

not simply jump in and immediately start testing their hypotheses using statistical analysis. The following

explains the importance of descriptive statistics to test data to ensure assumptions are met before using a

parametric test.

MBA 5652, Research Methods

2

Assumptions: The Importance of Describing Data

UNIT x STUDY GUIDE

Title

There are various benefits of describing the data. One of the most important benefits is to determine if the

data meet the assumptions that are required for the use of parametric statistical procedures. Parametric

procedures include, but are not limited to, correlation, regression, t test, and ANOVA. Parametric tests have

different assumptions that must be met depending on which test is being considered, but most parametric

tests require that the assumption of normality be met. Normality refers to a normal distribution of data which,

when graphed as

80

frequencies, resembles a

bell shape (as in the image

to the right). Other common

70

assumptions that must be

met, depending on the

Bell

statistical procedure used,

60

Curve

include sample size, levelsof-measurement,

50

homogeneity of variance,

independence, absence of

40

outliers, linearity, etc. (Field,

2005). It is critical that the

researcher understands the

30

assumptions for any

parametric statistical

20

procedure being considered

to determine if they are met

10

before employing the

procedure in a research

study. An Internet search

10 20

30 40 50 60 70 80 90 100

for any parametric test will

quickly return results that

Normal distribution graph with a bell curve

list required assumptions.

If the assumptions are not

met, parametric statistical procedures cannot be used. To use them would result in invalid results.

Fortunately, there are corresponding non-parametric tests that can be used when the data do not meet

assumptions for parametric tests. Non-parametric tests also have assumptions that must be met, but they are

fewer and less rigid. An example of a parametric procedure for correlation would be Pearson’s correlation

coefficient (Pearson’s r), while a corresponding non-parametric test for correlation would be Spearman’s rank

correlation coefficient (Spearman’s rho). An example of a causal-comparative parametric procedure would be

ANOVA, while a corresponding non-parametric causal-comparative test would be Kruskal-Wallis.

Since non-parametric tests do not require that as many assumptions are met, some students wonder why

non-parametric tests are not always used. The reason is that parametric tests are superior to and more

powerful than non-parametric tests and should be used if the assumptions are met. A parametric test is more

likely to find a true effect when one exists, therefore rejecting the null hypothesis, than a non-parametric test

(Norusis, 2008). In other words, a parametric test is less likely to commit a Type II error. Norusis (2008)

recommends that researchers conduct both parametric and non-parametric tests if they are unsure as to

which is most appropriate to use. If the test results are the same, there is nothing more to worry about. If the

test results are statistically significant for the parametric test, and non-significant for the non-parametric test,

the researcher should take a closer look at whether the assumptions were met or not.

Assumption of Normality

Assumptions are evaluated both visually and statistically. As mentioned previously, a normal distribution of

data is the most commonly required assumption for parametric statistical tests. The following will explain how

the assumption of normality can be described and tested.

A normal distribution of data exhibits the characteristics of a bell-shaped curve, as shown below. In a perfect

normal curve, the frequency distribution is symmetrical about the center; the mean, median, and mode are all

MBA 5652, Research Methods

3

equal; and the tails of the curve approach but do not touch the x-axis (Salkind,UNIT

2009).

These are

all

x STUDY

GUIDE

preliminary indicators that a curve may represent a normal distribution, but there

are additional factors to

Title

consider.

Distribution curves can be short and wide, tall and thin, and anywhere in between. As shown below, each of

the colored bell-shaped curves has a mean

(μ) of zero. Their standard deviations (σ),

however, or the measure of how widely the

data disperses around the mean, are

different for each curve. The orange curve

has a relatively small standard deviation

because the data is closely clustered around

the mean. The red curve has a relatively

large standard deviation because the data is

loosely clustered around the mean.

Kurtosis describes the tallness of the curves.

A platykurtic curve is short and squatty (think

plateau), which, as shown at the right in the

red curve, represents a relatively greater

number of scores in the tails of the curves. A

leptokurtic curve is tall and thin (think leapt

for the sky), which, as shown in the orange

Distribution curves

curve, represents a data distribution of

relatively fewer number of scores in the tails (Field, 2005). Platykurtic and leptokurtic curves can challenge

the assumption of normality, even when the curve is bell-shaped.

The data may also be asymmetrical with the data more heavily distributed to one side of the curve or the

other. When the data distribution curve is asymmetrical, it is referred to as skewness. Below are examples of

negative skewness and positive skewness. Like platykurtic and leptokurtic curves, those exhibiting skewness

also threaten the assumption of normality.

Left-skewed and right-skewed graphs

(Sundberg, 2014)

The assumption of normality can be evaluated visually by describing the frequency of responses in a data set.

The frequency table below shows the results of a 120-point safety test administered to 500 employees. For

example, two employees scored in the test range of 50–54, 90 employees scored in the range of 85–89, and

three employees scored in the range of 110–114.

MBA 5652, Research Methods

4

When the frequency data is plotted in a histogram, the curve of the

data can be observed. To create a histogram, the data values (test

score ranges) from the data set are plotted on the x-axis, and the

frequency of the values are plotted on the y-axis. So, using the same

example from the discussion of the frequency table, it can be seen in

the histogram that two employees scored in the test range of 50–54,

90 employees scored in the range of 85–89, and three employees

scored in the range of 110–114.

UNIT x STUDY GUIDE

Title

By observing the histogram below, it appears the data are

approximately normally distributed, and there are no visible outliers.

While there is no skewness observed, the kurtosis favors a

leptokurtic curve. Skewness and kurtosis can be confirmed by

generating descriptive statistics, which is a routine function in

statistical packages, including Excel Data Analysis Toolpak. There is

a lot of debate regarding acceptable levels of skewness and kurtosis

among researchers. George and Mallery (2010) suggest skewness

and Kurtosis scores between -2 and +2 as satisfactory results to

accept normal distribution. All researchers agree that the closer

skewness and kurtosis are to 0, the better. The more kurtosis and

skewness deviate from 0, the greater the chances that the data is

not normally distributed (Field, 2005). As shown in the descriptive

statistics table, both skewness and kurtosis are both relatively close

to 0.

It should also be noted that the mean, median, and mode are similar in the descriptive data table below. As

noted above, the mean, median, and mode are identical in a perfect distribution. The data presented here

would suggest that it is approximately a normal distribution of data.

MBA 5652, Research Methods

5

Descriptive Statistics

Mean

Standard Error

Median

Mode

Standard Deviation

Sample Variance

Kurtosis

Skewness

Range

Minimum

Maximum

Sum

Count

Largest(1)

Smallest(1)

UNIT x STUDY GUIDE

Title

80.546

0.446621439

81

75

9.986758969

99.73535471

0.095314585

0.065078019

64

53

117

40273

500

117

53

The frequency distribution should also be observed for outliers. Outliers are extreme scores far away from the

mean in the left or right tails of the curve. Outliers can bias the mean due to their extreme scores. There are

different recommendations for how to treat outliers, such as removing the outlier from the data set, but the

ramifications should be understood before taking any such action. This is an example where consulting the

literature is strongly recommended.

Finally, normality can be tested statistically. Several tests can be used to objectively test for normality

including Kolmogorov-Smirnov, Shapiro-Wilk, chi-square, Jarque-Bera, Anderson-Darling, and others. Each

test has advantages and disadvantages. Once again, this is where the researcher is well-served to consult

the literature to determine the most appropriate test for his or her project.

The Kolmogorov-Smirnov (KS) test is often used to test for normality. KS compares the frequency distribution

of the sample data set to a model of normally distributed data with the same mean and distribution as the

sample data. The KS test is performed to test a null and alternative hypothesis, like any other statistical test.

The following are the hypotheses.

Ho1: There is no statistically significant difference in normality between the sample data and model data.

Ha1: There is a statistically significant difference in normality between the sample data and model data.

If the results are statistically significant at a p level < .05, the null hypothesis is rejected, and the alternative
hypothesis is accepted that there is a statistically significant difference in normality between the sample data
and model data. Therefore, we would conclude that the assumption of normality is not met, and a nonparametric test would be required to test our data.
If the results are not statistically significant at a p level > .05, the null hypothesis is accepted (and the

alternative rejected) that there is no statistically significant difference in normality between the sample data

and model data. Therefore, we would conclude that the assumption of normality is met, and a parametric test

would be acceptable to test our data.

It is important to note that the above steps for evaluating the assumption of normality require a holistic view.

No single description of the data is sufficient to make a decision about normality. For example, the KS test is

sensitive to small changes in normality for large sample sizes. The result is that it can be prone to Type I

errors. Therefore, the researcher should consider all the available information, both visual inspection and

statistical analysis, before making a decision about normality (Field, 2005). If, after following the steps above,

MBA 5652, Research Methods

6

the assumption of normality does not appear to be met, non-parametric statistical

should be

UNITprocedures

x STUDY GUIDE

considered in lieu of parametric tests.

Title

Assumptions Other Than Normality

There are two additional assumptions that should be met for any statistical test. They are measurement

scales and measures of central tendency.

Measurement scales: Statistical procedures used to test hypotheses have unique assumptions about the

scales on which the data are measured. Data are measured on nominal, ordinal, interval, or ratio scales. It is

important to determine the assumption of measurement scales for any statistical procedure being considered

to test the data. For example, an assumption of Pearson’s r is that data be measured at the interval or ratio

level. Pearson’s r could not be used to analyze ordinal data. The non-parametric test, Spearman’s rho, would

be required to analyze ordinal data for correlation.

Rules for Measurement Scales

Nominal: Nominal data can be classified but not ordered and have no meaningful distance between variables

or unique origin (true zero). This is also referred to as categorical data. Examples include names or

categories, like gender and marital status. Examples of statistical procedures that use nominal data include

chi-square (Cooper & Schindler, 2014).

Ordinal: Ordinal data can be classified and ordered but have no meaningful distance between data values or

unique origin (true zero). Examples include surveys with responses ranked on a five-point Likert scale, such

as strongly agree to strongly disagree. Examples of statistical procedures that use ordinal data include

Spearman’s rho, Mann-Whitney test, Wilcoxon test, Kruskal-Wallis test, and Friedman test (Cooper &

Schindler, 2014).

Interval: Interval data can be classified and ordered and have meaningful distance between data values but

no unique origin (true zero). A classic example of an interval level of measurement is temperature measured

in degrees. The data is ordered, there are differences between measures, but there is no true zero. Since

there is no true zero, it would be improper to say 40 degrees is twice as cold as 20 degrees. Examples of

statistical procedures that use interval data include Pearson’s r, regression analysis, t test, and ANOVA

(Cooper & Schindler, 2014).

Ratio: Ratio data can be classified and ordered, have meaningful distance between data values, and have

unique origin (true zero). Examples include age in years and income in dollars. Examples of statistical

procedures that use ratio data include Pearson’s r, regression analysis, t test, and ANOVA (Cooper &

Schindler, 2014). It should be noted that parametric tests are used to analyze data measure at the interval

and ratio levels but cannot be used to analyze data measured at the nominal and ordinal levels.

Measures of central tendency: It may have become evident by now, from the use of the histogram and the

discussion of normality, that there is interest in how the data points are dispersed around the mid-point of the

curve. This is called central tendency and is the foundation for statistical analysis using linear models. In

short, our statistical procedures evaluate how much our data vary from that midpoint when a straight line is fit

to the data (Field, 2005). The important takeaway is that the central tendency of that midpoint can be

measured in three different ways: a) mean, b) median, and c) mode. As was seen in the descriptive statistics

output above, mean, median, and mode are usually included in descriptive statistics generated by software.

As was the case with normality and levels of measurement, it is important to determine the assumption of

central tendency for any statistical procedure being considered to test the data.

Mean: The arithmetic mean is the most commonly used measure of central tendency. It is calculated by

adding the data scores and dividing by the number of cases. The mean is the measure of central tendency

used with interval and ratio data and is used for statistical procedures like correlation, regression analysis, t

test, and ANOVA (Salkind, 2009).

Median: The median is the score among the distribution of data, when ordered from highest to lowest, where

half of the data points occur above the median and half of the data points occur below the median. In the data

MBA 5652, Research Methods

7

set 1, 3, 5, 7, and 9, the median would be 5 since half of the values occur above

andx half

below.

The median

UNIT

STUDY

GUIDE

is the measure of central tendency used with ordinal data (Salkind, 2009).

Title

Mode: The mode is the data value that occurs most frequently in the data set, regardless of order. In a data

set of 5, 5, 5, 3, 3, 9, 9, 9, 9, 1, 1, 1, 7, 7, 7, 7, 7, the mode would be 7 because it is the value that occurs

most frequently in the data set. The mode is the measure of central tendency used with nominal levels of

measurement (Salkind, 2009).

In Closing—A Word About Validity and Reliability

Although some of the most important and common assumptions of statistical testing have been discussed in

this lesson, there are still more. This may seem like a very taxing and laborious process to partake in before

even getting to the point of testing the research hypothe …

Purchase answer to see full

attachment