In this post, we will discuss the scientific method and how it is applied in business, process science and data science. This key fundamental drives a lot of the thinking about improvement, optimization, formalizing problems and using evidence and data to make decisions.
What is the Scientific Method?
Think back to elementary school. Every year most schools have something called the “Science Fair”. You would have to get one of those big tri-fold display boards and decorate it with some type of experiment you were supposed to carry out. Basically, this is a way for children to come up with a problem or theory to test, test the theory and then present the data as conclusions. The whole point of this is to instill the scientific method into children. The principles of this exercise are used by adults in research and industry, although they have traded out the tri-fold displays for PowerPoints and white papers. Let’s explore the scientific method in a little more detail.
There are many variants and wordings for the scientific method. For the purpose of this writing, we will stick to five major steps.
1. Identify the problem or question – This step is the beginning of the entire process. What are you trying to solve? These questions or problems usually come from observed phenomena. Over the years people have asked questions such as what are clouds made of? How do birds fly? Why did the apple fall from the tree and hit me in the head? That last question specifically led to the discovery of gravity, all from asking a question. A clear, well formulated problem or question needs to be specific enough to be tested and proven or disproven.
2. Hypothesis – Now that there is a question or problem to solve, this step seeks to address the question and formulate a theory to test. Basically, the experimenter must come up with a potential reason for the observed phenomena. This is usually something driven by the evidence on hand so far. Getting the hypothesis usually requires some thinking, knowledge or possibly existing data. This theory must be explicitly stated and able to be tested. If it cannot be tested, it cannot be proven.
3. Prediction / Research – Depending on the word you use, this step usually relates to studying your question and collecting some more information and data. Basically, the experimenter here is trying to think of the consequences of the hypothesis and use this to describe the phenomenon to test. If my hypothesis is true, then I would see this….This step could require gaining some knowledge and researching information for the prediction.
4. Experiment / Testing – In this step, a controlled experiment is put in place to test the prediction and hypothesis. In almost any experiment, there is a control and there is a test subject. The control is meant to be the baseline case that does not prove a hypothesis while the test subject injects elements from the prediction meant to validate the hypothesis. Key example of this is testing medications for side effects. A control group is given a placebo while a test group is given a medication to see if they have a certain reaction. If both groups experienced a reaction or no reaction, then the hypothesis cannot be proven. The hypothesis in this case would be the medication causes measles. The prediction would say measles would form on the skin after taking medication. The experiment would test if people that took the medication had a measles outbreak. The two groups would then be compared to see if the medication leads to increased cases of measles in the sample. Measurable variables need to be defined for the experiment to generate data later used for analysis.
5. Analysis / Presentation – This last step takes the data from the experiment conducted earlier and performs analysis to either prove or reject the hypothesis. It is common here to use statistical testing to confirm the data shows a statistical difference and is not the victim of variation. The results of the experiment could also drive the creation of some type of model explaining the phenomena being studied. This step often takes data and visualizes it in a graph for presentation. Once the conclusion is reached, then the presentation of the answer is provided. This is often in the form of a paper or publication. If no conclusion is reached or more experimentation is needed, prior steps may be revisited. The hypothesis may need to be refined or changed or the experiment may not have been controlled or statistically significant enough. Either way, there are multiple possibilities at this step.
The purpose of all of this is to make a discovery or come up with an answer that can be proven with data. This process is the background for innovation and discovery. Interestingly enough, this process is more common than many people think. It is represented in many different forms in business and engineering under different names.
Applications in Business
1. DMAIC – This process is used in Six Sigma and stands for Define, Measure, Analyze, Improve and Control. In this process, a problem is formulated, research is performed and data collected about the problem. The analyze phase is a little misleading, it is really the hypothesis stage in which a hypothesis is formulated and tested using statistical testing methods such as the student t-test, ANOVA, Design of Experiments etc…Once the hypothesis is proven, a solution is generated during the improve phase. Last, the control stage presents the findings and the resulting solution. It is easy to see the parallel with the scientific method.
2. PDCA / A3 – Lean philosophy has something similar called PDCA. It stands for plan, do, check and act. It is a simplified version of DMAIC in which a problem, question or mission is identified in the plan stage. Next, solutions or changes are generated and experiments or pilots are carried out. The results are checked and the act stage either shows the process repeating or the solution is implemented. The A3 method is also from lean and is similar to PDCA, but more formal. An A3 is a piece of paper that is roughly 11 x 17 and walks through a process that identifies the problem, collects current conditions, sets a goal, develops a hypothesis and analyzes the outcome of an experiment, creates a process proposal, implementation plan and tracking of solution.
3. Operations Research Problem Solving Process – In operations research, this process is used and leads to development of a model to help guide the answer for implementation. The process begins with the situation which describes a problem to be solved. This flows into the problem statement which takes the situation and identifies constraints, objectives, data requirements etc…The next step is to build a model to try to come to a solution or answer to the question or problem. This could be through OR tactics such as a linear programming model or use statistical testing to answer a question. OR tends to use the model developed to help formulate the hypothesis. Next, a solution is derived, which is synonymous with the hypothesis. The solution is based on evidence from the developed model. Next, the solution is tested or experiments performed. A more complex model could be developed or the solution could prove to be effective. Last, the controls and procedure are developed and the solution implemented. Again, this process is driving the experimenter through a similar process where a problem is defined, a hypothesis is formed, tested and the results communicated or used to make a decision.
4. Data Science Process – This process is less formal than the earlier methods and is relatively new compared to the earlier methods, but important nonetheless. This process revolves around data, but it provides purpose to the data scientist. First is to define the problem or objective to be solved! Next, is to determine data requirements, availability and either refine the problem or collect the data. Then the data is cleaned to eliminate empty entries and formatted to be used in modeling. The data scientist then explores the data to understand the distribution, find any possible patterns and formulate how to use this data to solve the problem. Essentially, the user is forming a hypothesis at this step. Next, a model is created to either solve the problem or answer the question (prediction vs inference). The results are analyzed here to ensure the problem or question has been solved. Finally, the conclusions are written up and presented or a model is developed into a data product and implemented.
While this has been a whirl-wind tour of methodologies, they are all pretty similar and stem from the scientific method. These methodologies are used to solve real life problems, find optimal solutions and generally make things better. In our later posts, we will review these methodologies in more depth individually, but it is important to see how they stem from and connect back to the scientific method where problems are identified, hypotheses are generated and then tested to prove whether or not they are true. This avoids costly mistakes incurred by gut driven decision making and seeks to optimize revenue that can be generated. Our next post will discuss the linear regression model as it applies to machine learning.