Exploratory data analysis and regression using Python
This post is an overview of a project submitted for the Fundamentals of Data Analysis module at GMIT as part of the Higher Diploma in Computing and Data Analytics.
The project involved performing an analysis of the Tips dataset, exploratory data analysis and regression.
The aim of the project was to put into practice the core concepts of the module with the Python pandas
and seaborn
libraries using the Jupyter
notebook environment.
There were 3 distinct tasks involved in this project.
- Describing the Tips dataset using descriptive statistics and plots
- Regression: discussing and analysing whether there is a relationship between total bill and tip amount
- An analysis of the relationship between the variables in the dataset
This post provides an overview of the project.
The project itself can be downloaded or cloned from the project repository at https://github.com/angela1C.
The Tips dataset is a very small dataset, is widely available online and can be easily read into a pandas DataFrame
directly from the URL.
The Tips dataset has only 244 rows and 7 variables which represents some tipping data where one waiter recorded information about each tip he received over a period of a few months working in one restaurant.
-
-
-
Downloading and running the project code
This project was developed using the seaborn, pandas and matplotlib.pyplot packages.
-
Part 1: Describe the tips dataset using descriptive Statistics and plots
The goal for part 1 is to begin the exploratory data analysis by providing a summary of the main characteristics of the Tips dataset using statistics and plots and to see what the data tells us.
-
Part 2: Regression: Discuss and analyse whether there is a relationship between the total bill and tip amount
Looking at the relationship between the total bill amount and the tip using regression and how other variables interacted with the total bill in determining tip amount.
-
Part 3 Analysis: Analyse the relationship between the variables within the dataset
Looking at some of the relationships between the different variables in the dataset. The focus will be on multivariate analysis of the dataset using both non-graphical and graphical means but particularly the latter.
-
-
This post is an overview of a project submitted for the Fundamentals of Data Analysis module at GMIT as part of the Higher Diploma in Computing and Data Analytics. The project involved performing an analysis of the Tips dataset, exploratory data analysis and regression.
The aim of the project was to put into practice the core concepts of the module with the Python pandas
and seaborn
libraries using the Jupyter
notebook environment.
There were 3 distinct tasks involved in this project.
- Describing the Tips dataset using descriptive statistics and plots
- Regression: discussing and analysing whether there is a relationship between total bill and tip amount
- An analysis of the relationship between the variables in the dataset
This post provides an overview of the project.
The project itself can be downloaded or cloned from the project repository at https://github.com/angela1C.
The Tips dataset is a very small dataset, is widely available online and can be easily read into a pandas DataFrame
directly from the URL.
The Tips dataset has only 244 rows and 7 variables which represents some tipping data where one waiter recorded information about each tip he received over a period of a few months working in one restaurant.
-
Downloading and running the project code
This project was developed using the seaborn, pandas and matplotlib.pyplot packages. -
Part 1: Describe the tips dataset using descriptive Statistics and plots
The goal for part 1 is to begin the exploratory data analysis by providing a summary of the main characteristics of the Tips dataset using statistics and plots and to see what the data tells us. -
Part 2: Regression: Discuss and analyse whether there is a relationship between the total bill and tip amount
Looking at the relationship between the total bill amount and the tip using regression and how other variables interacted with the total bill in determining tip amount. -
Part 3 Analysis: Analyse the relationship between the variables within the dataset
Looking at some of the relationships between the different variables in the dataset. The focus will be on multivariate analysis of the dataset using both non-graphical and graphical means but particularly the latter.