COMPARISON OF SPURIOUS CORRELATION METHODS USING PROBABILITY DISTRIBUTOINS AND PROPORTION OF REJECTING A TRUE NULL HYPOTHESIS

ABSTRACT
The problem of spurious correlation analysis, e.g. Pearson moment-product correlation test is that, the data need to be normally distributed. This research work compares spurious correlation methods using some non- normal probability distributions in order to obtain the method with the best degree of association among them. The methods were compared using proportions of rejecting true null hypothesis obtained from t and z test statistics for testing correlation coefficients. Data from Normal, log-normal, exponential and contaminated normal distributions were generated using simulation method with different sample sizes. The results indicate that, when the data are normal, exponential and contaminated normal random distributions, Pearson's and Spearman's rank have the best proportion of rejecting the true null hypothesis. But, when the data are log-normal distribution, only Spearman's rank correlation coefficient has the best proportion of rejecting the true null hypothesis. Thus, Pearson's and Spearman's rank have the best degree of association under normal, exponential and contaminated normal distributions. While, for log-normal distribution only Spearman's rank has the best degree of association.

TABLE OF CONTENTS
TITLE PAGE
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
ABBREVIATIONS
ABSTRACT

CHAPTER ONE
1.0       INTRODUCTION
1.1       Background to the Study
1.2       Statement of the Problems
1.3       Aim and Objectives
1.4       Significance of Study
1.5       Scope and Limitations
1.6       Definition of Terms

CHAPTER TWO
2.0       LTERATURE REVIEW
2.1       Introduction
2.2       Theory of Spurious Correlation Coefficients
2.2.1 The Pearson Correlation Coefficient
2.2.2    The Spearman Rank Correlation Coefficients
2.2.3    Kendall Rank Correlation Coefficients

CHAPTER THREE
3.0       MATERIALS AND METHODOLOGY
3.1       Data Used for the Study
3.2       Simulation Study
3.3       Probability Distribution Used for Simulation
3.3.1    Normal Distribution
3.3.2    Log-normal Distribution
3.3.3    Exponential Distribution
3.3.4    Contaminated normal Distribution
3.4       Software Used
3.5       Level of Significant
3.6       The Pearson Correlation Coefficient
3.7       The Spearman Rank Correlation Coefficients
3.8       Kendall Rank Correlation Coefficients
3.9       Testing a Single Correlation Coefficient
3.10     Testing Two Correlation Coefficients
3.11     Criteria for Identifying Best Proportion of Rejecting True Null Hypothesis

CHAPTER FOUR
4.0       RESULTS AND DISCUSSION
4.1       Introduction
4.2       Spurious Correlation Test for Poverty Levels in Nigeria
4.3       Comparison of Correlation Coefficient Tests
4.4       Discussion of Findings

CHAPTER FIVE
5.0       SUMMARY, CONCLUSION AND SUGGESTION FOR FURTHER STUDIES
5.1       Summary
5.2       Conclusion
5.3       Suggestion for further Studies
REFERENCES
APPENDICES

CHAPTER ONE
1.0          INTRODUCTION
1.1         Background to the Study
The awareness of problems related to the statistical analysis on spurious correlation began as early as 1897 by Karl Pearson in his seminar paper on spurious correlations, which title began significantly with the words “On a form of spurious correlation” and then repeatedly by a geologist Chayes (1960).

The main source of information about the history of spurious correlation test is that, Pearson used the term spurious correlation to “distinguish the correlations of scientific importance from those that were not.” The problem, according to Pearson, was that some correlations did not indicate an “organic relationship.” Although this term is never defined, the examples used suggest that spurious correlation was the same as a correlation between two variables that were not causally connected and the term correlation coefficient only measures the strength of linear relationships (Johnson and Kotz 1992). The simplicity and interpretability should be the main ideas when selecting measures of association. Historically, the Pearson correlation has been the main association measure in multivariate analysis. It is simple, as it relates only two variables

of a random vector; it concerns only linear transformation in Rn , i.e. change of scale plus a shift. Interpretation relies on the linear regression ideas, which in turn are related

to the geometry of Rn , where covariance appears as a Euclidean inner product in the space of samples (Lovell et al, 2013). All these desirable properties will be achieved when Pearson correlation is applied to study association. Correlations between variables can be measured with the use of different indices (coefficients). The three most popular are: Pearson’s coefficient r , Spearman’s rho coefficient r and Kendall’s....

For more Statistics Projects click here
================================================================
Item Type: Postgraduate Material  |  Attribute: 73 pages  |  Chapters: 1-5
Format: MS Word  |  Price: N3,000  |  Delivery: Within 30Mins.
================================================================

Share:

No comments:

Post a Comment

Select Your Department

Featured Post

Reporting and discussing your findings

This page deals with the central part of the thesis, where you present the data that forms the basis of your investigation, shaped by the...

Followers