There are six processes in data science, which are
- Problem Identification and Definition
- Collection of data
- Data processing
- Data Exploration
Problem identification and definition
The first step in any data science project is to identify the problem you want to solve and then define the problem. A problem definition is a statement that gives a clear explanation of the identified problem. This process is also referred to as "farming the problem."
Collection of data
After a clear definition of the problem, the data scientist knows the data he needs to solve the problem.
Collection of data is the process of finding and collecting relevant and validated data. This data has to be from a trusted source, as using inaccurate data will definitely lead to inaccurate results. As the saying goes, "garbage in, garbage out. The source could be from an organization, a website, social media, or even a survey.
In this process, ideas and methods are developed to find hidden trends and patterns. For instance, in a sales dataset, you may find a particular product had huge sales on particular days, a particular product sold by a particular salesperson, etc. It is a very important process and it is also referred to as data analysis.
In this step, you have to determine and choose the right type of machine learning model, either supervised or unsupervised.
Supervised learning : it is a process in which input data that has been mapped or labelled to a particular output is used in training a computer algorithm. In this process, the computer algorithm is fed with a lot of problems that have a solution attached to them, so they can solve similar problems by themselves.
Unsupervised learning is a process in which input data that has not been mapped or labelled to a particular output is used in training a computer algorithm.
This involves giving a report of your findings and a possible solution to the identified problem. It is simply the result of the entire process. This can be done with the aid of slides showing a visual representation of your findings. It is done best using a storytelling approach.