You can find the first parts of the article here:
What makes the conclusions from DOE work in practice?
Once we have the experimental matrix, our recipe for conducting a DOE, we need to consider how to approach its implementation. There will always be factors that we won’t manipulate during the DOE – so it’s worth considering what we’ll do with them before starting the experiment.
For example, one of the untested factors in this DOE is egg size. We need to decide whether we’ll conduct the experiment with a single egg size or use eggs of varying sizes. Other factors could be the type and size of the pot or the amount of water used for cooking. Do we always want to use the same pot, precisely measure the same amount of water, and cook on the same cooking zone? Or maybe we’ll use multiple pots (different materials and sizes), not control the water volume, and have a variable burner?
How would you approach this? Would you prefer to keep the untested factors constant, or would you allow for this variation during the DOE?
Below is scenario 1, in which the untested factors in the DOE are kept constant:
Click to enlarge
We conduct the experiment with eggs of the same size, purchased from a single supplier, from a single shipment. We cook them on a single induction cooktop, using the same pot and measuring the same amount of water each time. Subsequent runs of the experiment are conducted on the same cooking zone, and if salt is added, always using the same amount.
For comparison, consider scenario 2:
Click to enlarge
We also conduct the experiment with the same egg size and on a single hotplate (because we only have one egg). Unlike scenario 1, we purchase eggs from different suppliers, so they vary in freshness, expiration date, and storage conditions. We use pots of different materials and sizes. We adjust the amount of water so that the egg is completely covered (meaning, the amount will vary depending on the pot). We add salt “by eye.”
In both scenarios, we boil 16 eggs according to the same experimental matrix. In each scenario, we obtain 16 results for each egg tested.
Which scenario is better? Which would you choose?
Most people answer: scenario 1. In this case, we feel more confident about the consistency of the results. Scenario 2 raises concerns that differences between pots, hotplates, and eggs could “mask” the effects of the studied factors and make drawing conclusions difficult.
However, the answer to the question of which scenario is better is not obvious. It all depends on the conditions under which we want to use the DoE results. If we are experimenting for our own needs, in our own kitchen, and know that we can always use the same pot and burner to cook eggs, add the same amount of water and optionally salt, and buy eggs of the same size from a single supplier, the chances are very high that the conclusions drawn from the DoE conducted according to Scenario 1 will prove true in practice.
However, if we are working on a general recipe for how to hard-boil a perfect egg and want to find tips that will work for a wide range of egg lovers, the chances that the conclusions from such a DoE will prove true in practice are very low. Every cook uses different hobs, pots, eggs from different sources, and so on. In such a case, the conclusions from the DoE according to Scenario 2 will be more universal, provided, of course, that the eggs are the size tested (in Scenario 2, the egg size was constant).
Every good analysis, regardless of how the data was collected (whether from process observation or experiment), should indicate the limitations of the inferences. They always exist and depend on decisions made during the planning phase. Inference constraints are simply the conditions under which the resulting conclusions will be valid in practice. It’s naive to believe that something tested on one machine will work identically on others – even those of the same type. Similarly, if something has been tested on one prototype, it won’t necessarily translate into mass production.
In short: the experiment should be conducted under the conditions under which we intend to apply the resulting conclusions.
Scenario 1 seems safer and allows us to detect even small effects of the tested factors. However, if these effects are unlikely to be replicated in practice, their value is negligible. This is a common source of frustration for experimenters – the DOE analysis indicates ideal process settings, but after implementation, the results are nonexistent. The problem isn’t with the DOE itself, but rather that the range of conditions tested was too narrow – that is, all untested factors were held constant, as in scenario 1.
To effectively and efficiently design experiments, it’s important to remember that the DOE is not just about the Design Structure (what the experiment matrix shows), but also about the Unit Structure – how we handle disturbances during the DoE. The more disturbances present during the DoE, the smaller the constraints on inference. Are scenarios 1 and 2 the same experiment? No – they are two different experiments, differing precisely in the universality of the conclusions we can draw from them.
Ultimately, the decision on how to conduct an experiment should be conscious and well-considered.
