On Pygmalion and sensory integration research
Occupational therapists have been attempting to improve research on sensory integration by adopting more strict fidelity standards and by using Goal Attainment Scaling as an outcome measure. Three years ago I blogged about an SI effectiveness study and expressed some concerns on the research design - you can read about that at http://abctherapeutics.blogspot.com/2011/01/new-study-on-si-effectiveness-but.html.
A new study has been published by Schaaf et al (2013) and can be accessed online in full text at http://link.springer.com/article/10.1007%2Fs10803-013-1983-8/fulltext.html However, one major difference is that in the new study the researchers used an intervention and a 'usual care' group.
Use of control groups in this manner can help to correct for potential Hawthorne effects - but only if the study is designed properly. In the previous study there was an OT/SI group and a fine motor training group. The fine motor group in the first study was probably not exactly a 'sham intervention' but at least offered a parallel point of comparison because it controlled for some Hawthorne effects. Both groups made progress but the OT/SI group made more progress. It was difficult to know if the progress was due to the intervention or the attention (since both groups made progress) or expectancy bias because there was inadequate blinding with measurements.
In the new study the researchers added a 'usual care' research design which actually is quite different from a sham design. 'Usual care' designs are valuable if you are using psychometrically sound measurements but you run into potential problems with Hawthorne effects again if you fail to adequately match the treatment experiences. So, the new study substituted the sham intervention with a 'usual care' approach when the most appropriate design would have been to do a three-group study of sham and usual care and intervention groups. That would have left the only potential point of weakness the nature of the 'usual care' group and what therapies they were receiving.
I don't understand why this study had a usual care (non OT/SI) group and used GAS as a primary measure. From a design perspective that is confusing. The concept of GAS is that you are developing meaningful outcome measures based on parent input - so they developed goals for both groups but then only one group got the intervention and the other did not. Hawthorne effects would dictate that the parent's expectations for the intervention group could have influenced their GAS outcome reporting. After all, if you take a group of families who have children with autism, ask them to develop goals, and then do nothing it is not likely they will connect a 'no treatment' condition to any kind of goal attainment! The fact that the evaluators are 'blinded' is a design canard and really does not address this important issue. I am also not sure that 'usual care' mitigates the design flaw.
If you are asking the 'treatment' families to participate in a high intensity 3 day per week program - of course they will have a lot of sunk cost bias in reporting positive treatment outcomes. In order to equalize for this effect you have to have the non-treatment group equally investing in some other activity, and then you would more fully control for Hawthorne effect and see if the intervention itself accomplished the GAS objectives.
This is why a three group design would be most appropriate, particularly in the case where you can't possibly blind the families who are self-reporting on progress with GAS.
This new study also measured adaptive behavior and autism behaviors; there were no differences between the intervention and usual care groups. Actually, this is the most concerning finding in the whole study. So, the only potentially interesting findings were in the parent-reported GAS measures.
For those who are more interested in statistics, there is also the issue of using parametric statistics on GAS measures - which may or may not be interval level data. It looks like the researchers took pains to use "equally spaced probability intervals" and they did this with indirect time measures (in the example they gave). However, given the large number of goals we don't really know how they operationalized this overall.
I understand that they probably tried to address the issue about use of parametric statistics with the way they scaled the goals but there are some big questions about whether this is an appropriate measurement strategy for these kinds of goals that are notoriously difficult to scale. As an example, just because you scale the amount of time to complete a toothbrushing task does that mean that the difference hypothesized to be caused by intervention is equivalently distributed across the scaling of time expectation to complete a task? In order to validate the findings there would first need to be a validation of the scaling - and this is something that needs more description and scrutiny in order to understand the findings.
It appears that there are both research design and statistical questions about this study. Given the small sample sizes, the lack of difference on adaptive behavior scales and autism behaviors, and the questions listed above re: GAS designs and measurement concerns, this study will probably not be accepted as game changing. It is disappointing that this study substituted one limited design for another, and in addition to the lack of functional progress on true gold standard testing like the Vineland, this will be the basis for criticism regarding these findings.
I believe that we need to carefully reflect on GAS designs, because if I attempt to take an outside view it appears as though multiple research studies failed accepted measures of significance, and now we are like Pygmalion asking Venus (played by self report of parents) to breathe life into our sensory integration statue.
There are even larger issues, of course, and these were mentioned in my blog post three years ago. Specifically, a three time per week intensity is often not feasible (either given insurance restrictions or school district authorizations). In many localities, the ship sailed on this kind of high intensity direct intervention model years ago.
Three years ago I thought that the Pfeiffer study was a good step forward and it gave us a lot of useful information on how to design future studies. Some of those issues were addressed in this new study but many were not and in fact new problems were introduced.
If there are true differences to be measured because of sensory integration treatment we will find them after we design studies that are not so vulnerable to criticism.
References:
Kerckhofs, E. (2010). Letter to the editor: Ordinal goal attainment scores are not suited to arithmetic operations or parametric statistics. Comment on GAS in rehabilitation: A practical guide. Clinical Rehabilitation, 24, p. 479.
Pfeiffer, B.A., Koenig, K., et.al. (2011) Effectiveness of sensory integration interventions in children with autism spectrum disorders: A pilot study. American Journal of Occupational Therapy, 65, 76-85.
Schaaf, R. et al (2013). An intervention for sensory difficulties in children with autism: A randomized trial. Journal of Autism and Developmental Disabilities, published online at http://link.springer.com/article/10.1007%2Fs10803-013-1983-8/fulltext.html
Tennant, A. (2007). Goal attainment scaling: Current methodological challenges. Disability Rehabilitation, 29, 1583-1588.
A new study has been published by Schaaf et al (2013) and can be accessed online in full text at http://link.springer.com/article/10.1007%2Fs10803-013-1983-8/fulltext.html However, one major difference is that in the new study the researchers used an intervention and a 'usual care' group.
Use of control groups in this manner can help to correct for potential Hawthorne effects - but only if the study is designed properly. In the previous study there was an OT/SI group and a fine motor training group. The fine motor group in the first study was probably not exactly a 'sham intervention' but at least offered a parallel point of comparison because it controlled for some Hawthorne effects. Both groups made progress but the OT/SI group made more progress. It was difficult to know if the progress was due to the intervention or the attention (since both groups made progress) or expectancy bias because there was inadequate blinding with measurements.
In the new study the researchers added a 'usual care' research design which actually is quite different from a sham design. 'Usual care' designs are valuable if you are using psychometrically sound measurements but you run into potential problems with Hawthorne effects again if you fail to adequately match the treatment experiences. So, the new study substituted the sham intervention with a 'usual care' approach when the most appropriate design would have been to do a three-group study of sham and usual care and intervention groups. That would have left the only potential point of weakness the nature of the 'usual care' group and what therapies they were receiving.
I don't understand why this study had a usual care (non OT/SI) group and used GAS as a primary measure. From a design perspective that is confusing. The concept of GAS is that you are developing meaningful outcome measures based on parent input - so they developed goals for both groups but then only one group got the intervention and the other did not. Hawthorne effects would dictate that the parent's expectations for the intervention group could have influenced their GAS outcome reporting. After all, if you take a group of families who have children with autism, ask them to develop goals, and then do nothing it is not likely they will connect a 'no treatment' condition to any kind of goal attainment! The fact that the evaluators are 'blinded' is a design canard and really does not address this important issue. I am also not sure that 'usual care' mitigates the design flaw.
If you are asking the 'treatment' families to participate in a high intensity 3 day per week program - of course they will have a lot of sunk cost bias in reporting positive treatment outcomes. In order to equalize for this effect you have to have the non-treatment group equally investing in some other activity, and then you would more fully control for Hawthorne effect and see if the intervention itself accomplished the GAS objectives.
This is why a three group design would be most appropriate, particularly in the case where you can't possibly blind the families who are self-reporting on progress with GAS.
This new study also measured adaptive behavior and autism behaviors; there were no differences between the intervention and usual care groups. Actually, this is the most concerning finding in the whole study. So, the only potentially interesting findings were in the parent-reported GAS measures.
For those who are more interested in statistics, there is also the issue of using parametric statistics on GAS measures - which may or may not be interval level data. It looks like the researchers took pains to use "equally spaced probability intervals" and they did this with indirect time measures (in the example they gave). However, given the large number of goals we don't really know how they operationalized this overall.
I understand that they probably tried to address the issue about use of parametric statistics with the way they scaled the goals but there are some big questions about whether this is an appropriate measurement strategy for these kinds of goals that are notoriously difficult to scale. As an example, just because you scale the amount of time to complete a toothbrushing task does that mean that the difference hypothesized to be caused by intervention is equivalently distributed across the scaling of time expectation to complete a task? In order to validate the findings there would first need to be a validation of the scaling - and this is something that needs more description and scrutiny in order to understand the findings.
It appears that there are both research design and statistical questions about this study. Given the small sample sizes, the lack of difference on adaptive behavior scales and autism behaviors, and the questions listed above re: GAS designs and measurement concerns, this study will probably not be accepted as game changing. It is disappointing that this study substituted one limited design for another, and in addition to the lack of functional progress on true gold standard testing like the Vineland, this will be the basis for criticism regarding these findings.
I believe that we need to carefully reflect on GAS designs, because if I attempt to take an outside view it appears as though multiple research studies failed accepted measures of significance, and now we are like Pygmalion asking Venus (played by self report of parents) to breathe life into our sensory integration statue.
There are even larger issues, of course, and these were mentioned in my blog post three years ago. Specifically, a three time per week intensity is often not feasible (either given insurance restrictions or school district authorizations). In many localities, the ship sailed on this kind of high intensity direct intervention model years ago.
Three years ago I thought that the Pfeiffer study was a good step forward and it gave us a lot of useful information on how to design future studies. Some of those issues were addressed in this new study but many were not and in fact new problems were introduced.
If there are true differences to be measured because of sensory integration treatment we will find them after we design studies that are not so vulnerable to criticism.
References:
Kerckhofs, E. (2010). Letter to the editor: Ordinal goal attainment scores are not suited to arithmetic operations or parametric statistics. Comment on GAS in rehabilitation: A practical guide. Clinical Rehabilitation, 24, p. 479.
Pfeiffer, B.A., Koenig, K., et.al. (2011) Effectiveness of sensory integration interventions in children with autism spectrum disorders: A pilot study. American Journal of Occupational Therapy, 65, 76-85.
Schaaf, R. et al (2013). An intervention for sensory difficulties in children with autism: A randomized trial. Journal of Autism and Developmental Disabilities, published online at http://link.springer.com/article/10.1007%2Fs10803-013-1983-8/fulltext.html
Tennant, A. (2007). Goal attainment scaling: Current methodological challenges. Disability Rehabilitation, 29, 1583-1588.
Comments
I have yet to see any SI treatment work, especially with cognitively delayed children, past the novelty stage. Consistent structure and behavior mod. techniques produce results.
I also have wondered how diagnoses like ADHD and ADD suddenly became part of SI territory?
I feel OTs have stuck their feet into too many areas. When we say we address the "occupation of living", or address "sensory systems", what does that mean? What isn't a life skill, what doesn't involve some sensory system-its all encompassing. We claim these infinite areas yet dont have the research / evidence to back up what we do for them; most times we are not even properly trained to address them.
SI and its treatment is a good idea with a lot of common sense however, good ideas and common sense do not make them true.
(Sorry for the soapbox)