A quasi-experiment is an empirical study used to estimate the causal impact of an intervention on its target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment (e.g., an eligibility cutoff mark). In some cases, the researcher may have control over assignment to treatment. Quasi-experiments are subject to concerns regarding internal validity, because the treatment and control groups may not be comparable at baseline. With random assignment, study participants have the same chance of being assigned to the intervention group or the comparison group. As a result, differences between groups on both observed and unobserved characteristics would be due to chance, rather than to a systematic factor related to treatment (e.g., illness severity). Randomization itself does not guarantee that groups will be equivalent at baseline. Any change in characteristics post-intervention is likely attributable to the intervention. With quasi-experimental studies, it may not be possible to convincingly demonstrate a causal link between the treatment condition and observed outcomes. This is particularly true if there are confounding variables that cannot be controlled or accounted for.
The first part of creating a quasi-experimental design is to identify the variables. The quasi-independent variable will be the x-variable, the variable that is manipulated in order to affect a dependent variable. “X” is generally a grouping variable with different levels. Grouping means two or more groups, such as two groups receiving alternative treatments, or a treatment group and a no-treatment group (which may be given a placebo - placebos are more frequently used in medical or physiological experiments). The predicted outcome is the dependent variable, which is the y-variable. In a time series analysis, the dependent variable is observed over time for any changes that may take place. Once the variables have been identified and defined, a procedure should then be implemented and group differences should be examined.
In an experiment with random assignment, study units have the same chance of being assigned to a given treatment condition. As such, random assignment ensures that both the experimental and control groups are equivalent. In a quasi-experimental design, assignment to a given treatment condition is based on something other than random assignment. Depending on the type of quasi-experimental design, the researcher might have control over assignment to the treatment condition but use some criteria other than random assignment (e.g., a cutoff score) to determine which participants receive the treatment, or the researcher may have no control over the treatment condition assignment and the criteria used for assignment may be unknown. Factors such as cost, feasibility, political concerns, or convenience may influence how or if participants are assigned to a given treatment conditions, and as such, quasi-experiments are subject to concerns regarding internal validity (i.e., can the results of the experiment be used to make a causal inference?).
Quasi-experiments are also effective because they use the "pre-post testing". This means that there are tests done before any data are collected to see if there are any person confounds or if any participants have certain tendencies. Then the actual experiment is done with post test results recorded. This data can be compared as part of the study or the pre-test data can be included in an explanation for the actual experimental data. Quasi experiments have independent variables that already exist such as age, gender, eye color. These variables can either be continuous (age) or they can be categorical (gender). In short, naturally occurring variables are measured within quasi experiments.
There are several types of quasi-experimental designs, each with different strengths, weaknesses and applications. These designs include (but are not limited to):
Of all of these designs, the regression discontinuity design comes the closest to the experimental design, as the experimenter maintains control of the treatment assignment and it is known to “yield an unbiased estimate of the treatment effects”.:242 It does, however, require large numbers of study participants and precise modeling of the functional form between the assignment and the outcome variable, in order to yield the same power as a traditional experimental design.
Though quasi-experiments are sometimes shunned by those who consider themselves to be experimental purists (leading Donald T. Campbell to coin the term “queasy experiments” for them), they are exceptionally useful in areas where it is not feasible or desirable to conduct an experiment or randomized control trial. Such instances include evaluating the impact of public policy changes, educational interventions or large scale health interventions. The primary drawback of quasi-experimental designs is that they cannot eliminate the possibility of confounding bias, which can hinder one’s ability to draw causal inferences. This drawback is often used to discount quasi-experimental results. However, such bias can be controlled for using various statistical techniques such as multiple regression, if one can identify and measure the confounding variable(s). Such techniques can be used to model and partial out the effects of confounding variables techniques, thereby improving the accuracy of the results obtained from quasi-experiments. Moreover, the developing use of propensity score matching to match participants on variables important to the treatment selection process can also improve the accuracy of quasi-experimental results. In fact, data derived from quasi-experimental analyses has been shown to closely match experimental data in certain cases, even when different criteria were used. In sum, quasi-experiments are a valuable tool, especially for the applied researcher. On their own, quasi-experimental designs do not allow one to make definitive causal inferences; however, they provide necessary and valuable information that cannot be obtained by experimental methods alone. Researchers, especially those interested in investigating applied research questions, should move beyond the traditional experimental design and avail themselves of the possibilities inherent in quasi-experimental designs.
A true experiment would, for example, randomly assign children to a scholarship, in order to control for all other variables. Quasi-experiments are commonly used in social sciences, public health, education, and policy analysis, especially when it is not practical or reasonable to randomize study participants to the treatment condition.
As an example, suppose we divide households into two categories: Households in which the parents spank their children, and households in which the parents do not spank their children. We can run a linear regression to determine if there is a positive correlation between parents' spanking and their children's aggressive behavior. However, to simply randomize parents to spank or to not spank their children may not be practical or ethical, because some parents may believe it is morally wrong to spank their children and refuse to participate.
Some authors distinguish between a natural experiment and a "quasi-experiment". The difference is that in a quasi-experiment the criterion for assignment is selected by the researcher, while in a natural experiment the assignment occurs 'naturally,' without the researcher's intervention.
Quasi-experiments have outcome measures, treatments, and experimental units, but do not use random assignment. Quasi-experiments are often the design that most people choose over true experiments. The main reason is that they can usually be conducted while true experiments can not always be. Quasi-experiments are interesting because they bring in features from both experimental and non experimental designs. Measured variables can be brought in, as well as manipulated variables. Usually Quasi-experiments are chosen by experimenters because they maximize internal and external validity.
Since quasi-experimental designs are used when randomization is impractical and/or unethical, they are typically easier to set up than true experimental designs, which requirerandom assignment of subjects. Additionally, utilizing quasi-experimental designs minimizes threats to ecological validity as natural environments do not suffer the same problems of artificiality as compared to a well-controlled laboratory setting. Since quasi-experiments are natural experiments, findings in one may be applied to other subjects and settings, allowing for some generalizations to be made about population. Also, this experimentation method is efficient in longitudinal research that involves longer time periods which can be followed up in different environments.
Other advantages of quasi experiments include the idea of having any manipulations the experimenter so chooses. In natural experiments, the researchers have to let manipulations occur on their own and have no control over them whatsoever. Also, using self selected groups in quasi experiments also takes away to chance of ethical, conditional, etc. concerns while conducting the study.
Quasi-experimental estimates of impact are subject to contamination by confounding variables. In the example above, a variation in the children's response to spanking is plausibly influenced by factors that cannot be easily measured and controlled, for example the child's intrinsic wildness or the parent's irritability. The lack of random assignment in the quasi-experimental design method may allow studies to be more feasible, but this also poses many challenges for the investigator in terms of internal validity. This deficiency in randomization makes it harder to rule out confounding variables and introduces new threats to internal validity. Because randomization is absent, some knowledge about the data can be approximated, but conclusions of causal relationships are difficult to determine due to a variety of extraneous and confounding variables that exist in a social environment. Moreover, even if these threats to internal validity are assessed, causation still cannot be fully established because the experimenter does not have total control over extraneous variables.
Disadvantages also include the study groups may provide weaker evidence because of the lack of randomness. Randomness brings a lot of useful information to a study because it broadens results and therefore gives a better representation of the population as a whole. Using unequal groups can also be a threat to internal validity. If groups are not equal, which is sometimes the case in quasi experiments, then the experimenter might not be positive what the causes are for the results.
Internal validity is the approximate truth about inferences regarding cause-effect or causal relationships. This is why validity is important for quasi experiments because they are all about causal relationships. It occurs when the experimenter tries to control all variables that could affect the results of the experiment. Statistical regression, history and the participants are all possible threats to internal validity. The question you would want to ask while trying to keep internal validity high is "Are there any other possible reasons for the outcome besides the reason I want it to be?" If so, then internal validity might not be as strong.
External validity is the extent to which results obtained from a study sample can be generalized to the population of interest. When External Validity is high, the generalization is accurate and can represent the outside world from the experiment. External Validity is very important when it comes to statistical research because you want to make sure that you have a correct depiction of the population. When external validity is low, the credibility of your research comes into doubt. Reducing threats to external validity can be done by making sure there is a random sampling of participants and random assignment as well.
"Person-by-treatment" designs are the most common type of quasi experiment design. In this design, the experimenter measures at least one independent variable. Along with measuring one variable, the experimenter will also manipulate a different independent variable. Because there is manipulating and measuring of different independent variables, the research is mostly done in laboratories. An important factor in dealing with person-by-treatment designs are that random assignment will need to be used in order to make sure that the experimenter has complete control over the manipulations that are being done to the study.
An example of this type of design was performed at the University of Notre Dame. The study was conducted to see if being mentored for your job led to increased job satisfaction. The results showed that many people who did have a mentor showed very high job satisfaction. However, the study also showed that those who did not receive the mentor also had a high number of satisfied employees. Seibert concluded that although the workers who had mentors were happy, he could not assume that the reason for it was the mentors themselves because of the numbers of the high number of non-mentored employees that said they were satisfied. This is why prescreening is very important so that you can minimize any flaws in the study before they are seen.
"Natural experiments" are a different type of quasi experiment design used by researchers. It differs from person-by-treatment in a way that there is not a variable that is being manipulated by the experimenter. Instead of controlling at least one variable like the person-by-treatment design, experimenters do not use random assignment and leave the experimental control up to chance. This is where the name "natural" experiment comes from. The manipulations occur naturally, and although this may seem like an inaccurate technique, it has actually proven to be useful in many cases. These are the studies done to people who had something sudden happen to them. This could mean good or bad, traumatic or euphoric. An example of this could be studies done on those who have been in a car accident and those who have not. Car accidents occur naturally, so it would not be ethical to stage experiments to traumatize subjects in the study. These naturally occurring events have proven to be useful for studying posttraumatic stress disorder cases.
- ^ abcDinardo, J. (2008). "natural experiments and quasi-natural experiments". The New Palgrave Dictionary of Economics. pp. 856–859. doi:10.1057/9780230226203.1162. ISBN 978-0-333-78676-5.
- ^Rossi, Peter Henry; Mark W. Lipsey; Howard E. Freeman (2004). Evaluation: A Systematic Approach (7th ed.). SAGE. p. 237. ISBN 978-0-7619-0894-4.
- ^Gribbons, Barry; Herman, Joan (1997). "True and quasi-experimental designs". Practical Assessment, Research & Evaluation. 5 (14).
- ^ abMorgan, G. A. (2000). Quasi-Experimental Designs. Journal of the American Academy of Child & Adolescent Psychiatry. 39. pp. 794–796. doi:10.1097/00004583-200006000-00020.
- ^ abcdShadish; Cook; Cambell (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin. ISBN 0-395-61556-9.
- ^Campbell, D. T. (1988). Methodology and epistemology for social science: selected papers. University of Chicago Press. ISBN 0-226-09248-8.
- ^Armstrong, J. Scott; Patnaik, Sandeep (2009-06-01). "Using Quasi-Experimental Data To Develop Empirical Generalizations For Persuasive Advertising"(PDF). Journal of Advertising Research. 49 (2): 170–175. doi:10.2501/s0021849909090230. ISSN 0021-8499.
- ^ abcDeRue, Scott (September 2012). "A Quasi Experimental Study of After-Event Reviews". Journal of Applied Psychology. 97 (5): 997–1015. doi:10.1037/a0028244. PMID 22506721.
- ^CHARM-Controlled Experiments
- ^http://www.osulb.edu/~msaintg/ppa696/696quasi.htm[permanent dead link]
- ^Lynda S. Robson, Harry S. Shannon, Linda M. Goldenhar, Andrew R. Hale (2001)Quasi-experimental and experimental designs: more powerful evaluation designsArchived September 16, 2012, at the Wayback Machine., Chapter 4 of Guide to Evaluating the Effectiveness of Strategies for Preventing Work Injuries: How to show whether a safety intervention really worksArchived March 28, 2012, at the Wayback Machine., Institute for Work & Health, Canada
- ^Research Methods: Planning: Quasi-Exper. Designs
- ^Calder, Bobby (1982). "The Concept of External Validity". Journal of Consumer Research. 9 (3): 240–244. doi:10.1086/208920.
- ^ abMeyer, Bruce (April 1995). "Quasi & Natural Experiments in Economics". Journal of Business and Economic Statistics. 13 (2): 151–161. doi:10.1080/07350015.1995.10524589.
- ^Seibert, Scott (1999). "The Effectiveness of Facilitated Mentoring A Longitudinal Quasi Experiment". Journal of Vocational Behavior. 54 (3): 483–502. doi:10.1006/jvbe.1998.1676.
Objectives: Writing the Critique Statement on Design
Objectives: For any research article, be able to determine whether the study was a true experiment, quasi-experimental, or observational in design.Experimental Research
Experimental (predictive) research requires that the researchers
- establish two or more equivalent groups by randomly assigning subjects to experimental and control groups (or more),
- impose one (or more) active intervention/s (the independent variable/s) on some groups, and
- establish and impose a comparison intervention (placebo, untreated control, or usual care control) on one or more control groups, and
- observe and document outcome measures on all subjects to compare outcomes between the experimental and control groups.
Any change which is observed in the dependent variable is termed the "experimental effect". Since the groups were equivalent prior to the imposition of the independent variable, any statistically significant difference between the groups following the intervention can be said to be attributed to, or caused by, the imposition of the independent variable. (back to top)
Before beginning an experiment there needs to be a clear experimental hypothesis.
Example 1: in a study where the research hypothesis is "symmetric pulsed current neuromuscular electrical stimulation delivered at amplitudes above the threshold for contraction at 2 pulses per second continuously for 10 days will increase the mitochondrial density in quadriceps femoris muscle of rat", there would be an equal number of rats (or quadriceps muscles on the opposite to the stimulated side) that would serve as experimental and control subjects. The experimental (or alternate) hypothesis would state that there would be a statistically significant difference in the mitochondrial density in the stimulated quadriceps muscles compared to the unstimulated control muscles. Note that the comparison, or control group, is specified in the experimental hypothesis.
The type of control group that is appropriate for the study is determined by the experimental hypothesis. If the hypothesis states that there will be significantly faster wound recovery in patients treated with theravac compared with those treated with duoderm, then the two comparison groups must use those two treatments (and preferrably no others). No conclusions can be made with regard to the effectiveness of either compared to no treatment, since that wasn't studied.
If the experimental hypothesis states that use of a band-aid speeds recovery of a cut compared to non-use of a bandage, then the appropriate control group would be a no intervention control group (actually, they would undoubtedly still clean the wound, just not use a bandage).
Of statistical interest is the null hypothesis. The experimental hypothesis states that there is a difference between the experimental and the control groups, the null hypothesis states that there is no difference between the experimental and the control groups: that the true difference between the groups is zero.
In the experiment, actual data will be collected for each subject, and a mean of the experimental group and control group will be determined. There will almost always be a difference between the groups - the question is whether that difference is a true difference, or if it just represents measurement error. Statistical analysis determines the likelihood, or probability, that the difference observed between groups occurred by chance. If the probability is low that the difference between groups occurred by chance (due to random variability), the difference is said to be "significant", and the null hypothesis is rejected. If not, the null hypothesis cannot be rejected, and you must conclude that there is no evidence from this study that the two groups are different. (Back to top)
A true experimental design requires three conditions:
- The manipulation of one or more variables
- Random assignment of subjects to groups
- Control of extraneous variables by control groups
In Experimental research, the investigator begins by establishing two (or more) groups which are indistinguishable from each other on all variables of concern. This is usually accomplished by randomly assigning study participants to the groups. Then the investigator exposes one group to the independent variable (treatment) while exposing the other group to the comparison treatment, to no treatment or to a placebo or sham treatment. A placebo (or sham) treatment is given to simulate the effect of treatment without the critical ingredient (the most common example would be a sugar pill instead of the actual drug; the pill would be identical to the actual drug). The measurement (dependent variable) is then taken of each subject in both groups. Any difference in the dependent variable between the two groups is attributable to (caused by) the independent variable, since exposure to the independent variable is the only difference between the two groups. This is the ONLY way in which we can determine whether one thing CAUSES another! (back to top)
The conclusion that can be drawn, depend upon the type of control group used. If an experimental treatment is compared with a "usual care" control group, the conclusion must be limited, for example, to a statement that the experimental treatment significantly reduced the recurrence of infection as compared with the usual care control. It is more difficult to achieve a significant difference when comparing two treatments, because there is the expectation that both treatments will have some positive effect, rendering the difference between the two smaller than the difference between an untreated control and a group receiving active treatment.
The other consideration in evaluating a study that has used a "usual care" or comparison group control rather than an untreated control group is that there is no way to compare the outcome of the intervention with the outcome of no intervention. This should be weighed on a study by study basis - it may be critical in some cases, and trivial in others. In evaluaating the effectiveness of antibiotics on ear infections in children it is critical to compare to an untreated control group, because the problem spontaneously gets better in a matter of days in most cases. In other situations, spontaneous change is very unlikely - e.g., obese adults are not likely to spontaneously lose weight, sedentary adults are not likely to spontaneously get stronger or fitter. In this latter case, it would be less critical to have a control group that had no exercises, for example.
In a true experimental design, the attempt is made to eliminate or control for all other factors which might affect the dependent variable. These additional factors are termed "extraneous", "intervening" or "confounding" variables. To control for these confounding variables, research subjects are randomly assigned to the experimental group or control group. Random assignment eliminates systematic bias introduced by the experimenter, and is critical to ensure that the two groups are as similar as possible in every characteristic, except the experimental treatment. In that way, any differences that exist between the two groups can be attributed to the experimental treatment.
However, random assignment sometimes produces dissimilar groups purely by chance. If this occurs, it can destroy the logic (threaten the internal validity) of the experiment and preclude any conclusions drawn about the effect of the treatment. For example, a study of different stroke treatments randomly assigned subjects to two different treatment groups. They then compared functional outcome measures to determine if either treatment was superior to the other. However, they later discovered that most of one group had left-sided strokes and most of the other group had right-sided strokes. People with left sided strokes have difficulty understanding language, while people with right sided strokes do not. So any differences in outcomes between the two groups could be due to differences in the response of people with right vs left sided strokes rather than differences in the effectiveness of the two therapies. Since people with the two types of strokes are likely to respond differently to treatment, this consequence of "random assignment" made useless an enormous amount of work. In cases where other variables (such as side of the stroke, age, health status, time since the stroke, severity) may influence the outcome, it is wise to stratify or match the samples to equalize the effects. (Back to top)
In an experimental design, research subjects are randomly assigned to an experimental group or a control group. A pretest is then carried out where the dependent variable is measured in all subjects within both groups. The researcher then manipulates the independent variable in the experimental group, for example a specific form of treatment may be given. At the same time the control group is given a placebo or a different form of treatment. The experimental and the control group are then tested again to ascertain whether the dependent variable has changed. This is termed the posttest If there is more change in the experimental group than the control group, and if that change is found to be statistically significant, then it is concluded that the independent variable brought about the change. (Back to top)
Extraneous variables (also called confounding or intervening variables) are variables other than the independent variables that differ between groups. These extraneous variables confound the independent variables, because they, too, may contribute to any differences between the groups. For example, a group of teachers are conducting a fitness promotion study in first graders, with the experimental group receiving special fitness classes and the control group receiving the standard health classes. If the experimental group children teach the fitness exercises to the control group children, the comparison between the groups will be muddy, because some of the controls have been exposed to the experimental treatment. In a medical setting, a similar problem arises when patients in a study take medications in addition to those prescribed. If differences arise, you can't tell whether they are due to the intervention or something else the subjects were taking. In studies with uncontrolled extraneous variables, it cannot be concluded that the differences seen in the study were due to the intervention.
Experimental or Procedural Controls
In addition to having control groups, studies must also be sure that experimental conditions are the same for both groups. In order to attribute any differences in outcome to the treatment, there must be no other differences between the groups. If you test all of the control group subjects in the morning and all the experimental subjects in the afternoon, you may be introducing systematic bias in the data: The morning group might be sleepier, or the afternoon group might have more time to practice, or the tester may be fatigued in the afternoon, altering the accuracy of the results. If any of these occur, there are differences between control and experimental groups in addition to the treatment, so if the groups differ in outcomes, you can't know how much of the difference is due to the treatment, and how much was due to the rater or the subject's fatigue. Experimental Control is a way of eliminating or minimizing such extraneous variables. It is prudent to ask subjects in a study not to change their behavior, or not to take other medications during the study. When such restrictions are placed on subjects, this is an example of experimental control. Another example of experimental control would be requiring that participants not change their diet in a study of exercise induced weight control. If subjects in the experimental group (the exercisers) ate less, there could be a greater weight loss, but some of it could be due to diet changes. Similarly, if subjects in the control group ate less, you might see no differences between the groups at the end because of the weight reduction caused by the dieting control group subjects.
If a study does repeat testing on the same subjects, it is advisable to counterbalance the testing order. For example, if a study compares the effectiveness of 3 different drugs on lowering cholesterol, they could use a repeated measures design. With a group of 20 people with high cholesterol take drug 1 for 6 weeks, drug 2 for 6 weeks and drug 3 for 6 weeks, with 6 week wash-out periods. If it was done this way, however, there could be some carry-over effects of the first 2 drugs on the effectiveness of the last drug. To prevent this possibility, you would need 21 subjects, and have each group of 7 take the drugs in a different order: A: 1,2,3; B: 3,1,2; C: 2,3,1. In this way, any order effects would be balanced between the drugs, so there would be no systematic bias. This is another example of experimental control.
If experimental conditions do introduce systematic bias and experimental controls are not used or are not effective, it cannot be concluded that differences in outcome measures were due to the intervention: they could be due to the extraneous variables. For example, in the last example of 3 drugs tested, if all groups got the same drug order (1,2,3) and subjects were significantly better after drug 3 than after drugs 1 and 2, this improvement could be due to the combined effects of the 3 drugs (or just to drugs 1 & 3, or 2 & 3), just due to drug 3, or just due to natural spontaneous improvement. Since there was no control group, we can't know what would have happened to participants had they not had any drug intervention. So, the conclusion from this study would be quite limited: the participants improved over time, but the improvement cannot be attributed to any of the drugs tested. This is why it is critical to have experimental control.(Back to top)
There is a special form of controlled experimental research termed a double-blind experimental design. This is the gold standard of experimental research. In the double-blind experimental study neither the research participants nor the researchers who will analyze the data know who has received the experimental treatment and who has not. In this way the research cannot be biased by either the perceptions of the investigators or the subjects.
In addition to these three basic designs, there are multifactorial designs in which there is more than one experimental independent variable. Factorial designs permit the analysis of more than one independent variable and any interaction effects between those independent variables. A typical example would be the effects of a diet on cholesterol levels in which men and women are also compared. Diet and Gender would both be independent variables, of the non-paired, independent kind. If there is a pre-test and post-test, as is likely, that would be a third independent variable, a paired or repeated measures variable "time of testing."
A single article may also describe several studies. Each dependent variable measured is a separate study in univariate analysis. There are multivariate analyses which encompass many dependent variables simultaneously, but these are beyond the scope of this course. An individual study may also report several distinct analyses on the same data, using different grouping or independent variables.
(Back to top)
A true experimental design requires the three following conditions. Quasi-experimental designs have an active intervention, but lack one or both of the other 2 criteria for a true experiment (random assignment of subjects to groups or a control group)
Frequently the conditions required for a true experiment cannot be met by the researchers and alternative research designs are employed. Recall that if all three of these criteria are not met, this limits the conclusions drawn from the study to establishing or negating the existence of an association between or among variables. Only true experiments can determine whether one variable causes changes in another or not.
Designs that have an active intervention (active independent variable) but lack either random assignment, a control group or both are called quasi-experimental designs. Having an active intervention is an important feature of an experimental design, because it addresses questions that involve change: what happens if people change what they are doing to do X, as compared with those who do not change or change to Y.
The limitations of quasi-experimental designs result from lack of random assignment of subjects to groups or lack of a control group. We'll take each of these in turn, and discuss the consequences of each.
Quasi-experimental designs have active interventions, but lack either randomization of subjects to groups and/or a control group. In these studies, statistically significant differences between groups in outcome measures cannot be attributed to the intervention, or independent variable, due to the potential for confounding or intervening variables. (Back to top)
One Group Pre-test, Post-test Design
This variant lacks a control group (so cannot randomly assign subjects), and compares one group measured at two or more different times. The subjects are measured on the dependent variable, then exposed to the intervention (the independent variable), and then the dependent variable is measured again. There is no control group.
O1 X O2
Differences between O1 and O2 could be due to many factors in addition to the investigators intervention (X). There could be natural development (maturation), spontaneous recovery, or an event or situation (history) that occurred between the pre-test and post-test. These confounding variables increases with the time between the tests, and can be reduced if the tests are closer together. The possibility of test-retest confounding exists to the extent that the pre-test would influence the post-test performance. Differences between O1 and O2 could also be caused by regression if the subjects were selected for participation based on a high or low score on some measure - high scorers would be expected to score lower on retest, while low scorers would be expected to score higher upon retest, even in the absence of an intervention. There is also the possibility of confounding by differences in the instrumentation decay, to the extent that measures at the post-test would be different, i.e., more experienced, more fatigued, more demanding, or biased by expectations of the rater.
Lack of Random Assignment to Groups
These studies have an active intervention (active independent variable), but compare groups that are not randomly assigned. Because they lack random assignment, we cannot be certain that the groups are comparable prior to the intervention. Therefore, any observed difference or change after the intervention cannot be attributed to the intervention. We can't conclude that the intervention caused the change or difference.
An example of this type of design would be a study that compared a group of step exercisers to a group of computer users - two groups at a local YMCA. The outcome measure was balance; the subjects self selected to participate and chose which group they participated in. The researchers simply requested permission to test the balance of the participants of each group. The results showed that the balance scores of the step exercise group were higher than those of the computer group. Can we conclude that the higher balance scores were due to the step exercise participation?
Think of the kind of people who would choose to participate in each of the two groups, and who would NOT choose to participate in the groups. Certainly anyone with difficulty in balance would not be likely to sign up for step exercise, and those were are not physically active or not interested in physical activity would not be as likely to join that group. The computer group may be more attractive to a sedentary population. So, the difference in balance scores could reflect the overall activity level of the participants rather than the participation in this one activity. Or, their could be other similarities in the groups. The point is that we can't be certain that our choice of independent variable - the factor the study would like to attribute the differences to - is the only source of the differences actually observed. So, we must be cautious in our interpretation of the results. In science, when we can't be certain, we must acknowledge that we don't know. Science isn'tabout what we'd like to believe!
(This study could be improved by doing a pretest. If the groups are equivalent on balance scores prior to beginning the classes, but only the exercise group improves significantly, then they woudl have good evidence to support the hypothesis that step exercise improves balance.)
Observational designs include what epidemiologists refer to as cohort and case-control studies. The feature they share in common is the lack of an active intervention: hence the label "observational." These studies do not impose active interventions onto subjects with their consent, but instead observe the medical records or self-reports (by questionnaire or interview). Case-control studies compare a group of subjects with a particular disorder and a different group of subjects who do not have the disorder. Typically many dependent variables are studied to determine whether any of these variables is different. Cohort studies frequently study entire populations of nations or large longitudinal studies. Data are gathered on many variables, and relationships (associations) between and among variables are investigated.
A Lack of active intervention study is one in which the groups compared are formed by putting subjects into groups (or assigning subjects to groups) based on a characteristic of the subject. Some examples of such characteristics would be: ethnicity, presence/absence of disease, habits, exposure to or presence of a risk factor, or their score on a screening test. These studies have a passive or attribute independent variable. They also lack random assignment to groups, because assignment based on feature or score precludes random assignment.
In this design, a group of subjects that has been exposed to an independent variable (e.g., a risk factor, treatment) is measured on the dependent variable. These results are compared to another group that have not been exposed to the independent variable (e.g., risk factor). This design is frequently used to compare groups of individuals with and without a particular disease, injury, or risk factor on related measures, such as longevity, presence of comorbidities. The weakness of the design is that the groups likely vary in many other attributes in addition to X, so that outcome differences are not attributable to X. (back to top)
This design is frequently found in studies which compare a group exposed to a risk factor for some illness to a group not exposed: a case-control study. For example, a group of mine workers compared with a group of non-mine workers, with several outcome measures including respiratory illness records, age at death. Other examples: A group of smokers compared with a group of nonsmokers; people with diabetes compared with a group of non-diabetics. This design lacks internal validity since there is no way of demonstrating the equivalency of the 2 groups on factors, that is, it is likely that people who work in mines are similar to other mine workers more so than to non-mine-workers in other ways as well, and that makes the groups non-equivalent. It means that any differences between the two groups on the outcome measure - the dependent variable, cannot be attributed to the independent variable (here working in a mine) because that group may share other common features that could contribute to, or completely cause the difference in outcome.
Differences between groups that produce confounding can arise by the method of recruiting subjects, and by the purposive criteria used (selection bias). There is also the possibility of differential drop out - subjects exposed to X that are not included in the sample because of the severity of their condition, death, lack of willingness to participate, etc. (mortality threat to internal validity). These differences could also contribute to the observed differences between O1 and O2. To the extent that the exposure to X alters the maturation or development that would affect the outcome measures, maturation also has the potential to threaten the internal validity of this design
Such studies are valuable in predicting the incidence of disease in population, and identifying variables for further study. However, they are not useful for predicting what would happen if a variable is changed - since they don't study what would happen given a change.
For example, a study could use an extensive database of medical records and correlate the longevity (age at death) with reported frequency of flossing the teeth. A high correlation between the two indicates an association, but does not provide evidence that flossing prevents death, or prolongs life. It is more likely that those people who report flossing frequently also have other healthy habits, access to medical care, and on and on. This seems obvious. However, the desire to conclude that a causal relationship exists increases with the plausibility and popularity of the variables: e.g., A study of longevity and length of time of practice of Tai Chi shows a high positive correlation. Those who are long term practitioners of Tai Chi lived longer than those who practiced for a shorter time or not at all. It may be very tempting to conclude that the Tai Chi practice contributed to the longer lives, but there is no more justification in this example than in the first example. There are likely numerous ways in which the long term practitioners of Tai Chi differ from those who did not continue or participate at all that could also influence longevity. Those with chronic or debilitating illness would not be able to practice Tai Chi. Perhaps only those healthy enough continued the Tai Chi, while the less healthy did not, and also died earlier.
Another useful feature of observational studies is that, while they cannot be used to establish a causal relationship, a negative finding in this type of study can provide evidence AGAINST a causal relationship. For example if the hypothesis was that Tai Chi gives long life, we would predict that long term practitioners of Tai Chi would live longer than those who did not continue or participate at all in Tai Chi. If the study found that long term Tai Chi practitioners did NOT live longer than non-practitioners, this would provide evidence AGAINST the hypothesis that Tai Chi promotes longevity.
Another example: If family history does in fact contribute to deaths of first heart attacks, we would expect this study to find significant differences between the groups. Unfortunately, even a significant difference between groups in this study would not provide such evidence. If, however, this study did NOT find differences between the groups, and the samples were reasonably large, this would provide evidence AGAINST the role of family history in deaths of first heart attacks.
Observational Studies may be either prospective or Retrospective Design (Ex Post Facto)
Ex post facto or retrospective designs are observational studies in which the grouping and matching is done retrospectively, either from records or from memory. In some cases, some data may be collected on later information from the records or from the individuals themselves. An example would be a study in which deaths from first heart attacks is compared in a group with a family history of CVD and a group without such a family history. There would be other risk factors that would likely differ between the two groups, such as lifestyle, habits, risky behavior that may also be related within families (e.g., obesity, eating habits, exercise habits, hypertension, hyperlipidemia). The more additional matching factors that are included reduces the sample size of the "cases" and "controls." Specifically in this example, the number of "controls" can be found that match "cases" on obesity, hyperlipidemia, hypertension, eating and exercise habits, who also have no family of CVD will be few. This reduced sample has been so crafted that it is far from a representative sample of people without a family history of CVD, instead, it is a nonequivalent comparison group. So, even if those with a family history have a greater incidence of deaths from first heart attacks, the difference may be due to factors other than the family history.
All observational (and quasi-experimental) studies lack internal validity: selection bias is intrinsic to the design, since the groups are put into groups based on some attribute, and may also be susceptible to regression to the mean because the matching process has a tendency to select extreme scores from the "comparison" group to match the case group.