The Data Science Process.jpg

Get Your Free Consultation!

We do data science...

Data science is an often misunderstood term, but basically we use data to solve problems based on a inferential or predictive mathematical model. This model is then used to provide insights, create a dashboard, automate a process or build into an app. Our five step graphic illustrates our data science process.

Problem Definition

The data science process starts with defining the problem we are trying to solve. As a client, what are you looking for in your data? What are trying to predict, anticipate or relate? From there, what data do you have or need collected? Last, how do you want to receive your analysis? Do you want a write up of the results and findings or do you want an interactive tool to use with your business? The problem definition sets up the entire process for success. One way to not achieve the benefits of data science is to fail to scope your problem. We will never let you do this. 

Data Collection

Once the problem is scoped, we must collect data to solve the problem. Often clients have this in spreadsheets or in their internal databases. We can extract from traditional relational databases or from NoSQL databases and data lakes. There is no need to be concerned about your data format or the perceived quality of your data. Some clients may not have data to solve their problem. In these cases, we will either search for a public data set or help the client set up a data collection process and facilitate the collection of data for analysis. 

Exploratory Data Analysis and Prep

In this step, we take the data collected from the data collection step and perform statistical tests and create visualizations to understand the nature of the data. We work to understand what the response of the data looks like, what statistical distribution it resembles and understand if the input variables are normal or need to be normalized. Next, we prepare and format the data set for modeling which also includes handling missing or erroneous values. 

Data Modeling

Depending on the nature of the problem, we will work to model the relationship between the variable in question and the explanatory variables. This is done through traditional statistical modeling for effects or is done through machine learning techniques. This step often takes the longest and is the most complex. We need to ensure the methods do not violate the underlying assumptions and their conclusions are valid. 

Data Product

The output of the data science project is the data product. Some clients may just want a write up of the analysis in the form of a white paper. Others may want the results integrated into a dashboard and some may want the model integrated into an app or a process. Any data product is sufficient and depends on the customer. The data product is then communicated to the customer and training commences.