User login
Denosumab indication now includes multiple myeloma, Amgen announces
The Food and Drug Administration has expanded the indications for denosumab (Xgeva), previously indicated for the prevention of skeletal-related events in patients with bone metastases from solid tumors, to include patients with multiple myeloma, according to a press release from Amgen, the manufacturer of Xgeva.
“Up to 40% of [multiple myeloma] patients remain untreated for the prevention of bone complications, and the percentage is highest among patients with renal impairment at the time of diagnosis. Denosumab, which is not cleared through the kidneys, offers multiple myeloma patients bone protection with a convenient subcutaneous administration, providing patients with a novel treatment option,” Dr. Noopur Raje, director of the Center for Multiple Myeloma, Massachusetts General Hospital Cancer Center, Boston, said in the press release.
Adverse events in multiple myeloma patients were broadly similar to the known safety profile of denosumab. The most common adverse events were diarrhea, nausea, anemia, back pain, thrombocytopenia, peripheral edema, hypocalcemia, upper respiratory tract infection, rash, and headache. The most common adverse event resulting in discontinuation of treatment was osteonecrosis of the jaw.
Find the full press release on the Amgen website.
The Food and Drug Administration has expanded the indications for denosumab (Xgeva), previously indicated for the prevention of skeletal-related events in patients with bone metastases from solid tumors, to include patients with multiple myeloma, according to a press release from Amgen, the manufacturer of Xgeva.
“Up to 40% of [multiple myeloma] patients remain untreated for the prevention of bone complications, and the percentage is highest among patients with renal impairment at the time of diagnosis. Denosumab, which is not cleared through the kidneys, offers multiple myeloma patients bone protection with a convenient subcutaneous administration, providing patients with a novel treatment option,” Dr. Noopur Raje, director of the Center for Multiple Myeloma, Massachusetts General Hospital Cancer Center, Boston, said in the press release.
Adverse events in multiple myeloma patients were broadly similar to the known safety profile of denosumab. The most common adverse events were diarrhea, nausea, anemia, back pain, thrombocytopenia, peripheral edema, hypocalcemia, upper respiratory tract infection, rash, and headache. The most common adverse event resulting in discontinuation of treatment was osteonecrosis of the jaw.
Find the full press release on the Amgen website.
The Food and Drug Administration has expanded the indications for denosumab (Xgeva), previously indicated for the prevention of skeletal-related events in patients with bone metastases from solid tumors, to include patients with multiple myeloma, according to a press release from Amgen, the manufacturer of Xgeva.
“Up to 40% of [multiple myeloma] patients remain untreated for the prevention of bone complications, and the percentage is highest among patients with renal impairment at the time of diagnosis. Denosumab, which is not cleared through the kidneys, offers multiple myeloma patients bone protection with a convenient subcutaneous administration, providing patients with a novel treatment option,” Dr. Noopur Raje, director of the Center for Multiple Myeloma, Massachusetts General Hospital Cancer Center, Boston, said in the press release.
Adverse events in multiple myeloma patients were broadly similar to the known safety profile of denosumab. The most common adverse events were diarrhea, nausea, anemia, back pain, thrombocytopenia, peripheral edema, hypocalcemia, upper respiratory tract infection, rash, and headache. The most common adverse event resulting in discontinuation of treatment was osteonecrosis of the jaw.
Find the full press release on the Amgen website.
Home Monitoring of Cystic Fibrosis
Study Overview
Objective. To determine if an intervention directed toward early detection of pulmonary exacerbations using electronic home monitoring of spirometry and symptoms would result in slower decline in lung function.
Design. Multicenter, randomized, nonblinded 2-arm clinical trial.
Setting and participants. The study was conducted at 14 cystic fibrosis centers in the United States between 2011 and 2015. Cystic fibrosis patients (stable at baseline, FEV1 > 25% predicted) at least 14 years old (adolescent and adults) were included and randomized 1:1 to either an early intervention arm or usual care arm.
Intervention. The intervention arm used home-based spirometers and patient-reported respiratory symptoms using the Cystic Fibrosis Respiratory Symptoms Diary (CFRSD), which was to be completed twice weekly and collected by the central AM2 system. This AM2 system alerted sites to contact patients for an acute pulmonary exacerbation evaluation when FEV1 values fell by greater than 10% from baseline or CFRSD worsened from baseline in two or more of eight respiratory symptoms. The usual care arm patients had quarterly CF visits and/or acute visits based on their need.
Main outcome measures. The primary outcome variable was the 52-week change in FEV1 volume in liters. Secondary outcome variables were changes in CFQ-R (Cystic Fibrosis Questionnaire, revised), CFRSD, FEV1 % predicted, FVC in liters, FEF25-75%, time to first acute pulmonary exacerbation, time from first pulmonary exacerbation to subsequent pulmonary exacerbation, number of hospitalization days, number of hospitalizations, percent change in prevalence of Pseudomonas or Staphylococcus aureus and global assessment of protocol burden score.
Main results. A total of 267 patients were randomized. The results were analyzed using intention-to-treat analysis. There was no significant difference between study arms in 52-week mean change in FEV1 slope (mean slope difference, 0.00 L, 95% confidence interval, –0.07 to 0.07; P = 0.99). The early intervention arm subjects detected exacerbations sooner and more frequently than usual care arm subjects (time to first exacerbation hazard ratio, 1.45; 94% confidence interval, 1.09 to 1.93; P = 0.01). Adverse events were not significantly different between treatment arms.
Conclusion. An intervention of electronic home monitoring of patients with CF was able to detect more exacerbations than usual care, but this did not result in slower decline in lung function.
Commentary
Establishing efficacy and safety of home monitoring is a popular research topic in the current era of information technology. Most data to date has come from chronic adult disease such as heart failure, diabetes, or COPD [1]. While relatively rare, CF is a chronic lung disease that could potentially benefit from home monitoring. This is supported by previous evidence suggesting that up to a quarter of pulmonary exacerbations in CF patients result in worsened baseline lung function [2]. Close monitoring of symptoms and FEV1 using home monitoring was hypothesized to improve management and long-term function in this population. Indeed, in children with CF, electronic home monitoring of symptoms and lung function was able to detect pulmonary exacerbations early [3]. Frequency of monitoring is widely variable between centers, and some suggest aggressive monitoring of CF provides better clinical outcomes [4]. Current CF guidelines do not make specific recommendations regarding frequency of monitoring.
In this study, Lechtzin et al attempted to determine if the early detection of acute pulmonary exacerbations in CF patients by home monitoring and treatment would prevent progressive decline in lung function. This multicenter randomized trial was conducted at large CF centers in the US with a total cohort of 267 patients. The study had a mean follow-up time of 46.8 weeks per participant in the intervention arm and a mean follow-up time of 50.9 weeks per participant in the usual care arm. Given the predefined follow-up length (52 weeks) the primary outcome of FEV1 in liters was deemed sensitive enough to detect a decline of lung function. However the discrepancy between follow-up times with the intervention group having a 4.1-week shorter mean follow-up than the usual care could have influenced the interpretation of the results. Additionally, a large percentage of these patients were clinically stable at initial enrollment, with an average FEV1 % predicted of 79.5%. The stability of initial participants raises questions as to the efficacy of home monitoring in CF patient with moderate to severe lung disease. Mostly importantly, due to the nature of intervention the study could not be blinded, which could have substantially increased anxiety and self-awareness of patients in reporting their symptoms in the intervention arm.
Currently, an established consensus definition of pulmonary exacerbations of CF is lacking. Previous studies have proposed several different criteria of acute pulmonary exacerbations. Most proposed definitions depend on symptom changes such as cough, sputum, chest pain, shortness of breath, fatigue and weight-loss, making the definition less specific or objective.
The number of acute visits in the intervention arm was significantly higher than that in the usual care arm (153 vs 64). Despite a higher number of visits with intervention group, a significant number of these visits did not lead to a diagnosis of acute pulmonary exacerbation. Reportedly, 108 acute visits met protocol-defined pulmonary exacerbation and 29 acute visits did not meet protocol-defined pulmonary exacerbation in the intervention arm compared to 44 and 12 respectively in the usual care arm of the study. Given that the groups had similar baseline demographics and were randomized appropriately, one would expect that the number of acute visits severe enough to meet protocol-defined criteria as a pulmonary exacerbation would be similar in both groups. However, the absolute number of protocol-defined pulmonary exacerbations was far greater in the intervention group. Therefore, one could question the clinical significance of what was defined as acute pulmonary exacerbation. Potentially, the elevation of the absolute number of protocol-defined pulmonary exacerbations in the intervention group was simply due to increased surveillance. If the former were correct, one would expect the lack of identification/treatment of a significant number of pulmonary exacerbations in the usual care group would have led to a larger decline in FEV1 after 52 weeks than was seen in the results when compared to the intervention group. Given that the results of the study indicate no significant difference in change in FEV1 between study arms, perhaps the studied parameters in the intervention group were overly sensitive.
Of note, the usual care arm did have a statistically significant higher rate of hospitalizations and IV antibiotic use, suggesting that early identification of acute visits can identify patients earlier in the course of an acute pulmonary exacerbation and prevent higher level of care, though at the expense of more acute event “false positives,” or over-diagnosis. This trade-off may not result in cost saving, though this was not a consideration of this study. Additionally, there was likely difference in treatment, as treatment was not standardized, with potential implications for the validity of results.
The early intervention protocol was not only shown to lead to increased visits with no benefit in lung function decline, but as one may expect, also proved to be remarkably burdensome to many patients compared to the usual care protocol. Entering data on a weekly basis (or perhaps even monthly) was found to be burdensome in many remote-monitoring trials [5]. This may be especially apparent in a younger age group: in this study the average age of the study population was between 18 and 30 years of age. It can be hypothesized that this age group may not have enough responsibility, time, or enthusiasm to participate in home monitoring. Home monitoring maybe more effective in a disease condition where the average age is older or in a pediatric population in whom the parents oversee the care of the patient or have more time and receive subjective benefit from home monitoring services.
Less may be sufficient. The current study suggests that the home monitoring in CF may increase medical expense and unnecessary antibiotic use with no improvement in lung function. It is difficult to assess from this study the impact that the burden of home monitoring would have on clinical outcomes, however, previous meta-analysis of data studying COPD populations using home monitoring system, interestingly, also had increased health service usage and even led to increase in mortality in the intervention group compared with usual care group [1,6].
Perhaps the negative result of current study is due to the oftentimes variable definitions of and management algorithms for pulmonary exacerbations rather than the home monitoring system itself. Limited evidence exists for optimal threshold identification [7]. Aggregated, large amounts of data gathered by telemonitoring have not been proven to be used effectively. Moreover, as mentioned, a clear definition and management guidelines for pulmonary exacerbation are lacking. As a next step, studies are ongoing to evaluate how to use the collected data without increasing harm or cost. This could utilize machine learning or developing a more specific model defining and predicting pulmonary exacerbations as well as standardized indications for antibiotic therapy and hospitalization.
Applications for Clinical Practice
CF patients suffer from frequent pulmonary exacerbations and close monitoring and appropriate treatment is necessary to prevent progressive decline of lung function. This study has shown no benefit of electronic home monitoring in CF patients based on symptoms and spirometry over usual care. However, this negative outcome may be due to the limitation of the current definition of pulmonary exacerbation and lack of a consensus management algorithm. Optimizing the definition of pulmonary exacerbation and protocoling management based on severity may improve future evaluations of electronic home monitoring. Electronic home monitoring may help identify patients requiring evaluation; however, clinicians should continue to manage CF patients with conventional tools including regular follow-up visits, thorough history taking, and appropriate use of antibiotics based on their clinical acumen.
—Minkyung Kwon, MD, Joel Roberson, MD, Drew Willey, MD, and Neal Patel, MD (Mayo Clinic Florida, Jacksonville, FL, except for Dr. Roberson, of Oakland University/ Beaumont Health, Royal Oak, MI)
1. Polisena J, Tran K, Cimon K, et al. Home telehealth for chronic obstructive pulmonary disease: a systematic review and meta-analysis. J Telemed Telecare 2010;16 :120–7.
2. Sanders DB, Bittner RC, Rosenfeld M, et al. Failure to recover to baseline pulmonary function after cystic fibrosis pulmonary exacerbation. Am J Respir Crit Care Med 2010;182:627–32.
3. van Horck M, Winkens B, Wesseling G, et al. Early detection of pulmonary exacerbations in children with Cystic Fibrosis by electronic home monitoring of symptoms and lung function. Sci Rep 2017;7:12350.
4. Johnson C, Butler SM, Konstan MW, et al. Factors influencing outcomes in cystic fibrosis: a center-based analysis. Chest 2003;123:20–7.
5. Ding H, Karunanithi M, Kanagasingam Y, et al. A pilot study of a mobile-phone-based home monitoring system to assist in remote interventions in cases of acute exacerbation of COPD. J Telemed Telecare 2014;20:128–34.
6. Kargiannakis M, Fitzsimmons DA, Bentley CL, Mountain GA. Does telehealth monitoring identify exacerbations of chronic obstructive pulmonary disease and reduce hospitalisations? an analysis of system data. JMIR Med Inform 2017;5:e8.
7. Finkelstein J, Jeong IC. Machine learning approaches to personalize early prediction of asthma exacerbations. Ann N Y Acad Sci 2017;1387:153–65.
Study Overview
Objective. To determine if an intervention directed toward early detection of pulmonary exacerbations using electronic home monitoring of spirometry and symptoms would result in slower decline in lung function.
Design. Multicenter, randomized, nonblinded 2-arm clinical trial.
Setting and participants. The study was conducted at 14 cystic fibrosis centers in the United States between 2011 and 2015. Cystic fibrosis patients (stable at baseline, FEV1 > 25% predicted) at least 14 years old (adolescent and adults) were included and randomized 1:1 to either an early intervention arm or usual care arm.
Intervention. The intervention arm used home-based spirometers and patient-reported respiratory symptoms using the Cystic Fibrosis Respiratory Symptoms Diary (CFRSD), which was to be completed twice weekly and collected by the central AM2 system. This AM2 system alerted sites to contact patients for an acute pulmonary exacerbation evaluation when FEV1 values fell by greater than 10% from baseline or CFRSD worsened from baseline in two or more of eight respiratory symptoms. The usual care arm patients had quarterly CF visits and/or acute visits based on their need.
Main outcome measures. The primary outcome variable was the 52-week change in FEV1 volume in liters. Secondary outcome variables were changes in CFQ-R (Cystic Fibrosis Questionnaire, revised), CFRSD, FEV1 % predicted, FVC in liters, FEF25-75%, time to first acute pulmonary exacerbation, time from first pulmonary exacerbation to subsequent pulmonary exacerbation, number of hospitalization days, number of hospitalizations, percent change in prevalence of Pseudomonas or Staphylococcus aureus and global assessment of protocol burden score.
Main results. A total of 267 patients were randomized. The results were analyzed using intention-to-treat analysis. There was no significant difference between study arms in 52-week mean change in FEV1 slope (mean slope difference, 0.00 L, 95% confidence interval, –0.07 to 0.07; P = 0.99). The early intervention arm subjects detected exacerbations sooner and more frequently than usual care arm subjects (time to first exacerbation hazard ratio, 1.45; 94% confidence interval, 1.09 to 1.93; P = 0.01). Adverse events were not significantly different between treatment arms.
Conclusion. An intervention of electronic home monitoring of patients with CF was able to detect more exacerbations than usual care, but this did not result in slower decline in lung function.
Commentary
Establishing efficacy and safety of home monitoring is a popular research topic in the current era of information technology. Most data to date has come from chronic adult disease such as heart failure, diabetes, or COPD [1]. While relatively rare, CF is a chronic lung disease that could potentially benefit from home monitoring. This is supported by previous evidence suggesting that up to a quarter of pulmonary exacerbations in CF patients result in worsened baseline lung function [2]. Close monitoring of symptoms and FEV1 using home monitoring was hypothesized to improve management and long-term function in this population. Indeed, in children with CF, electronic home monitoring of symptoms and lung function was able to detect pulmonary exacerbations early [3]. Frequency of monitoring is widely variable between centers, and some suggest aggressive monitoring of CF provides better clinical outcomes [4]. Current CF guidelines do not make specific recommendations regarding frequency of monitoring.
In this study, Lechtzin et al attempted to determine if the early detection of acute pulmonary exacerbations in CF patients by home monitoring and treatment would prevent progressive decline in lung function. This multicenter randomized trial was conducted at large CF centers in the US with a total cohort of 267 patients. The study had a mean follow-up time of 46.8 weeks per participant in the intervention arm and a mean follow-up time of 50.9 weeks per participant in the usual care arm. Given the predefined follow-up length (52 weeks) the primary outcome of FEV1 in liters was deemed sensitive enough to detect a decline of lung function. However the discrepancy between follow-up times with the intervention group having a 4.1-week shorter mean follow-up than the usual care could have influenced the interpretation of the results. Additionally, a large percentage of these patients were clinically stable at initial enrollment, with an average FEV1 % predicted of 79.5%. The stability of initial participants raises questions as to the efficacy of home monitoring in CF patient with moderate to severe lung disease. Mostly importantly, due to the nature of intervention the study could not be blinded, which could have substantially increased anxiety and self-awareness of patients in reporting their symptoms in the intervention arm.
Currently, an established consensus definition of pulmonary exacerbations of CF is lacking. Previous studies have proposed several different criteria of acute pulmonary exacerbations. Most proposed definitions depend on symptom changes such as cough, sputum, chest pain, shortness of breath, fatigue and weight-loss, making the definition less specific or objective.
The number of acute visits in the intervention arm was significantly higher than that in the usual care arm (153 vs 64). Despite a higher number of visits with intervention group, a significant number of these visits did not lead to a diagnosis of acute pulmonary exacerbation. Reportedly, 108 acute visits met protocol-defined pulmonary exacerbation and 29 acute visits did not meet protocol-defined pulmonary exacerbation in the intervention arm compared to 44 and 12 respectively in the usual care arm of the study. Given that the groups had similar baseline demographics and were randomized appropriately, one would expect that the number of acute visits severe enough to meet protocol-defined criteria as a pulmonary exacerbation would be similar in both groups. However, the absolute number of protocol-defined pulmonary exacerbations was far greater in the intervention group. Therefore, one could question the clinical significance of what was defined as acute pulmonary exacerbation. Potentially, the elevation of the absolute number of protocol-defined pulmonary exacerbations in the intervention group was simply due to increased surveillance. If the former were correct, one would expect the lack of identification/treatment of a significant number of pulmonary exacerbations in the usual care group would have led to a larger decline in FEV1 after 52 weeks than was seen in the results when compared to the intervention group. Given that the results of the study indicate no significant difference in change in FEV1 between study arms, perhaps the studied parameters in the intervention group were overly sensitive.
Of note, the usual care arm did have a statistically significant higher rate of hospitalizations and IV antibiotic use, suggesting that early identification of acute visits can identify patients earlier in the course of an acute pulmonary exacerbation and prevent higher level of care, though at the expense of more acute event “false positives,” or over-diagnosis. This trade-off may not result in cost saving, though this was not a consideration of this study. Additionally, there was likely difference in treatment, as treatment was not standardized, with potential implications for the validity of results.
The early intervention protocol was not only shown to lead to increased visits with no benefit in lung function decline, but as one may expect, also proved to be remarkably burdensome to many patients compared to the usual care protocol. Entering data on a weekly basis (or perhaps even monthly) was found to be burdensome in many remote-monitoring trials [5]. This may be especially apparent in a younger age group: in this study the average age of the study population was between 18 and 30 years of age. It can be hypothesized that this age group may not have enough responsibility, time, or enthusiasm to participate in home monitoring. Home monitoring maybe more effective in a disease condition where the average age is older or in a pediatric population in whom the parents oversee the care of the patient or have more time and receive subjective benefit from home monitoring services.
Less may be sufficient. The current study suggests that the home monitoring in CF may increase medical expense and unnecessary antibiotic use with no improvement in lung function. It is difficult to assess from this study the impact that the burden of home monitoring would have on clinical outcomes, however, previous meta-analysis of data studying COPD populations using home monitoring system, interestingly, also had increased health service usage and even led to increase in mortality in the intervention group compared with usual care group [1,6].
Perhaps the negative result of current study is due to the oftentimes variable definitions of and management algorithms for pulmonary exacerbations rather than the home monitoring system itself. Limited evidence exists for optimal threshold identification [7]. Aggregated, large amounts of data gathered by telemonitoring have not been proven to be used effectively. Moreover, as mentioned, a clear definition and management guidelines for pulmonary exacerbation are lacking. As a next step, studies are ongoing to evaluate how to use the collected data without increasing harm or cost. This could utilize machine learning or developing a more specific model defining and predicting pulmonary exacerbations as well as standardized indications for antibiotic therapy and hospitalization.
Applications for Clinical Practice
CF patients suffer from frequent pulmonary exacerbations and close monitoring and appropriate treatment is necessary to prevent progressive decline of lung function. This study has shown no benefit of electronic home monitoring in CF patients based on symptoms and spirometry over usual care. However, this negative outcome may be due to the limitation of the current definition of pulmonary exacerbation and lack of a consensus management algorithm. Optimizing the definition of pulmonary exacerbation and protocoling management based on severity may improve future evaluations of electronic home monitoring. Electronic home monitoring may help identify patients requiring evaluation; however, clinicians should continue to manage CF patients with conventional tools including regular follow-up visits, thorough history taking, and appropriate use of antibiotics based on their clinical acumen.
—Minkyung Kwon, MD, Joel Roberson, MD, Drew Willey, MD, and Neal Patel, MD (Mayo Clinic Florida, Jacksonville, FL, except for Dr. Roberson, of Oakland University/ Beaumont Health, Royal Oak, MI)
Study Overview
Objective. To determine if an intervention directed toward early detection of pulmonary exacerbations using electronic home monitoring of spirometry and symptoms would result in slower decline in lung function.
Design. Multicenter, randomized, nonblinded 2-arm clinical trial.
Setting and participants. The study was conducted at 14 cystic fibrosis centers in the United States between 2011 and 2015. Cystic fibrosis patients (stable at baseline, FEV1 > 25% predicted) at least 14 years old (adolescent and adults) were included and randomized 1:1 to either an early intervention arm or usual care arm.
Intervention. The intervention arm used home-based spirometers and patient-reported respiratory symptoms using the Cystic Fibrosis Respiratory Symptoms Diary (CFRSD), which was to be completed twice weekly and collected by the central AM2 system. This AM2 system alerted sites to contact patients for an acute pulmonary exacerbation evaluation when FEV1 values fell by greater than 10% from baseline or CFRSD worsened from baseline in two or more of eight respiratory symptoms. The usual care arm patients had quarterly CF visits and/or acute visits based on their need.
Main outcome measures. The primary outcome variable was the 52-week change in FEV1 volume in liters. Secondary outcome variables were changes in CFQ-R (Cystic Fibrosis Questionnaire, revised), CFRSD, FEV1 % predicted, FVC in liters, FEF25-75%, time to first acute pulmonary exacerbation, time from first pulmonary exacerbation to subsequent pulmonary exacerbation, number of hospitalization days, number of hospitalizations, percent change in prevalence of Pseudomonas or Staphylococcus aureus and global assessment of protocol burden score.
Main results. A total of 267 patients were randomized. The results were analyzed using intention-to-treat analysis. There was no significant difference between study arms in 52-week mean change in FEV1 slope (mean slope difference, 0.00 L, 95% confidence interval, –0.07 to 0.07; P = 0.99). The early intervention arm subjects detected exacerbations sooner and more frequently than usual care arm subjects (time to first exacerbation hazard ratio, 1.45; 94% confidence interval, 1.09 to 1.93; P = 0.01). Adverse events were not significantly different between treatment arms.
Conclusion. An intervention of electronic home monitoring of patients with CF was able to detect more exacerbations than usual care, but this did not result in slower decline in lung function.
Commentary
Establishing efficacy and safety of home monitoring is a popular research topic in the current era of information technology. Most data to date has come from chronic adult disease such as heart failure, diabetes, or COPD [1]. While relatively rare, CF is a chronic lung disease that could potentially benefit from home monitoring. This is supported by previous evidence suggesting that up to a quarter of pulmonary exacerbations in CF patients result in worsened baseline lung function [2]. Close monitoring of symptoms and FEV1 using home monitoring was hypothesized to improve management and long-term function in this population. Indeed, in children with CF, electronic home monitoring of symptoms and lung function was able to detect pulmonary exacerbations early [3]. Frequency of monitoring is widely variable between centers, and some suggest aggressive monitoring of CF provides better clinical outcomes [4]. Current CF guidelines do not make specific recommendations regarding frequency of monitoring.
In this study, Lechtzin et al attempted to determine if the early detection of acute pulmonary exacerbations in CF patients by home monitoring and treatment would prevent progressive decline in lung function. This multicenter randomized trial was conducted at large CF centers in the US with a total cohort of 267 patients. The study had a mean follow-up time of 46.8 weeks per participant in the intervention arm and a mean follow-up time of 50.9 weeks per participant in the usual care arm. Given the predefined follow-up length (52 weeks) the primary outcome of FEV1 in liters was deemed sensitive enough to detect a decline of lung function. However the discrepancy between follow-up times with the intervention group having a 4.1-week shorter mean follow-up than the usual care could have influenced the interpretation of the results. Additionally, a large percentage of these patients were clinically stable at initial enrollment, with an average FEV1 % predicted of 79.5%. The stability of initial participants raises questions as to the efficacy of home monitoring in CF patient with moderate to severe lung disease. Mostly importantly, due to the nature of intervention the study could not be blinded, which could have substantially increased anxiety and self-awareness of patients in reporting their symptoms in the intervention arm.
Currently, an established consensus definition of pulmonary exacerbations of CF is lacking. Previous studies have proposed several different criteria of acute pulmonary exacerbations. Most proposed definitions depend on symptom changes such as cough, sputum, chest pain, shortness of breath, fatigue and weight-loss, making the definition less specific or objective.
The number of acute visits in the intervention arm was significantly higher than that in the usual care arm (153 vs 64). Despite a higher number of visits with intervention group, a significant number of these visits did not lead to a diagnosis of acute pulmonary exacerbation. Reportedly, 108 acute visits met protocol-defined pulmonary exacerbation and 29 acute visits did not meet protocol-defined pulmonary exacerbation in the intervention arm compared to 44 and 12 respectively in the usual care arm of the study. Given that the groups had similar baseline demographics and were randomized appropriately, one would expect that the number of acute visits severe enough to meet protocol-defined criteria as a pulmonary exacerbation would be similar in both groups. However, the absolute number of protocol-defined pulmonary exacerbations was far greater in the intervention group. Therefore, one could question the clinical significance of what was defined as acute pulmonary exacerbation. Potentially, the elevation of the absolute number of protocol-defined pulmonary exacerbations in the intervention group was simply due to increased surveillance. If the former were correct, one would expect the lack of identification/treatment of a significant number of pulmonary exacerbations in the usual care group would have led to a larger decline in FEV1 after 52 weeks than was seen in the results when compared to the intervention group. Given that the results of the study indicate no significant difference in change in FEV1 between study arms, perhaps the studied parameters in the intervention group were overly sensitive.
Of note, the usual care arm did have a statistically significant higher rate of hospitalizations and IV antibiotic use, suggesting that early identification of acute visits can identify patients earlier in the course of an acute pulmonary exacerbation and prevent higher level of care, though at the expense of more acute event “false positives,” or over-diagnosis. This trade-off may not result in cost saving, though this was not a consideration of this study. Additionally, there was likely difference in treatment, as treatment was not standardized, with potential implications for the validity of results.
The early intervention protocol was not only shown to lead to increased visits with no benefit in lung function decline, but as one may expect, also proved to be remarkably burdensome to many patients compared to the usual care protocol. Entering data on a weekly basis (or perhaps even monthly) was found to be burdensome in many remote-monitoring trials [5]. This may be especially apparent in a younger age group: in this study the average age of the study population was between 18 and 30 years of age. It can be hypothesized that this age group may not have enough responsibility, time, or enthusiasm to participate in home monitoring. Home monitoring maybe more effective in a disease condition where the average age is older or in a pediatric population in whom the parents oversee the care of the patient or have more time and receive subjective benefit from home monitoring services.
Less may be sufficient. The current study suggests that the home monitoring in CF may increase medical expense and unnecessary antibiotic use with no improvement in lung function. It is difficult to assess from this study the impact that the burden of home monitoring would have on clinical outcomes, however, previous meta-analysis of data studying COPD populations using home monitoring system, interestingly, also had increased health service usage and even led to increase in mortality in the intervention group compared with usual care group [1,6].
Perhaps the negative result of current study is due to the oftentimes variable definitions of and management algorithms for pulmonary exacerbations rather than the home monitoring system itself. Limited evidence exists for optimal threshold identification [7]. Aggregated, large amounts of data gathered by telemonitoring have not been proven to be used effectively. Moreover, as mentioned, a clear definition and management guidelines for pulmonary exacerbation are lacking. As a next step, studies are ongoing to evaluate how to use the collected data without increasing harm or cost. This could utilize machine learning or developing a more specific model defining and predicting pulmonary exacerbations as well as standardized indications for antibiotic therapy and hospitalization.
Applications for Clinical Practice
CF patients suffer from frequent pulmonary exacerbations and close monitoring and appropriate treatment is necessary to prevent progressive decline of lung function. This study has shown no benefit of electronic home monitoring in CF patients based on symptoms and spirometry over usual care. However, this negative outcome may be due to the limitation of the current definition of pulmonary exacerbation and lack of a consensus management algorithm. Optimizing the definition of pulmonary exacerbation and protocoling management based on severity may improve future evaluations of electronic home monitoring. Electronic home monitoring may help identify patients requiring evaluation; however, clinicians should continue to manage CF patients with conventional tools including regular follow-up visits, thorough history taking, and appropriate use of antibiotics based on their clinical acumen.
—Minkyung Kwon, MD, Joel Roberson, MD, Drew Willey, MD, and Neal Patel, MD (Mayo Clinic Florida, Jacksonville, FL, except for Dr. Roberson, of Oakland University/ Beaumont Health, Royal Oak, MI)
1. Polisena J, Tran K, Cimon K, et al. Home telehealth for chronic obstructive pulmonary disease: a systematic review and meta-analysis. J Telemed Telecare 2010;16 :120–7.
2. Sanders DB, Bittner RC, Rosenfeld M, et al. Failure to recover to baseline pulmonary function after cystic fibrosis pulmonary exacerbation. Am J Respir Crit Care Med 2010;182:627–32.
3. van Horck M, Winkens B, Wesseling G, et al. Early detection of pulmonary exacerbations in children with Cystic Fibrosis by electronic home monitoring of symptoms and lung function. Sci Rep 2017;7:12350.
4. Johnson C, Butler SM, Konstan MW, et al. Factors influencing outcomes in cystic fibrosis: a center-based analysis. Chest 2003;123:20–7.
5. Ding H, Karunanithi M, Kanagasingam Y, et al. A pilot study of a mobile-phone-based home monitoring system to assist in remote interventions in cases of acute exacerbation of COPD. J Telemed Telecare 2014;20:128–34.
6. Kargiannakis M, Fitzsimmons DA, Bentley CL, Mountain GA. Does telehealth monitoring identify exacerbations of chronic obstructive pulmonary disease and reduce hospitalisations? an analysis of system data. JMIR Med Inform 2017;5:e8.
7. Finkelstein J, Jeong IC. Machine learning approaches to personalize early prediction of asthma exacerbations. Ann N Y Acad Sci 2017;1387:153–65.
1. Polisena J, Tran K, Cimon K, et al. Home telehealth for chronic obstructive pulmonary disease: a systematic review and meta-analysis. J Telemed Telecare 2010;16 :120–7.
2. Sanders DB, Bittner RC, Rosenfeld M, et al. Failure to recover to baseline pulmonary function after cystic fibrosis pulmonary exacerbation. Am J Respir Crit Care Med 2010;182:627–32.
3. van Horck M, Winkens B, Wesseling G, et al. Early detection of pulmonary exacerbations in children with Cystic Fibrosis by electronic home monitoring of symptoms and lung function. Sci Rep 2017;7:12350.
4. Johnson C, Butler SM, Konstan MW, et al. Factors influencing outcomes in cystic fibrosis: a center-based analysis. Chest 2003;123:20–7.
5. Ding H, Karunanithi M, Kanagasingam Y, et al. A pilot study of a mobile-phone-based home monitoring system to assist in remote interventions in cases of acute exacerbation of COPD. J Telemed Telecare 2014;20:128–34.
6. Kargiannakis M, Fitzsimmons DA, Bentley CL, Mountain GA. Does telehealth monitoring identify exacerbations of chronic obstructive pulmonary disease and reduce hospitalisations? an analysis of system data. JMIR Med Inform 2017;5:e8.
7. Finkelstein J, Jeong IC. Machine learning approaches to personalize early prediction of asthma exacerbations. Ann N Y Acad Sci 2017;1387:153–65.
Addition of Durvalumab After Chemoradiotherapy Improves Progression-Free Survival in Unresectable Stage III Non-Small-Cell Lung Cancer
Study Overview
Objective. To evaluate the efficacy of the PD-L1 antibody durvalumab in the treatment of patients with unresectable stage III non-small-cell lung cancer (NSCLC) following completion of standard chemoradiotherapy.
Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.
Setting and participants. A total of 709 patients underwent randomization between May 2014 and April 2016. Eligible patients had histologically proven stage III, locally advanced and unresectable NSCLC with no evidence of disease progression following chemoradiotherapy. The enrolled patients had received at least 2 cycles of platinum-based chemotherapy concurrently with definitive radiation therapy (54 Gy to 66 Gy). Initially, patients were randomized within 2 weeks of completing radiation; however, the protocol was amended to allow randomization up to 42 days following completion of therapy. Patients were not eligible if they had previous exposure to anti-PD-1 or PD-L1 antibodies or active or prior autoimmune disease in the last 2 years. All patients were required to have an WHO performance status of 0 or 1. The patients were stratified at randomization by age (< 65 or > 65 years), sex and smoking status. Enrollment was not restricted to level of PD-L1 expression.
Intervention. Patients were randomized in a 2:1 ratio to receive consolidation durvalumab 10 mg/kg or placebo every 2 weeks for up to 12 months. The intervention was discontinued if there was evidence of confirmed disease progression, treatment with an alternative anticancer therapy, toxicity or patient preference. The response to treatment was assessed every 8 weeks for the first year and then every 12 weeks thereafter.
Main outcome measures. The primary endpoints of the study were progression-free survival (PFS) by blinded independent review and overall survival (OS). Secondary endpoints were the percentage of patients alive without disease progression at 12 and 18 months, objective response rate, duration of response, safety, and time to death or metastasis. Patients were given the option to provide archived tumor specimens for PD-L1 testing.
Results. The baseline characteristics were balanced. The median age at enrollment was 64 years and 91% of the patients were current or former smokers. The vast majority of patients (> 99% in both groups) received concurrent chemoradiotherapy. The response to initial concurrent therapy was similar in both groups with complete response rates of 1.9% and 3% in the durvalumab and placebo groups, respectively, and partial response rates of 48.7% and 46.8%. Archived tumor samples showed ≥ 25% PD-L1 expression in 22.3% of patients (24% in durvalumab group versus 18.6% in placebo group) and < 25% in 41% of patients (39.3%% in durvalumab group versus 44.3% in placebo group). PD-L1 status was unknown in 36.7% of the enrolled patients. Of note, 6% of patients enrolled had EGFR mutations.
After a median follow-up of 14.5 months, the median PFS was 16.8 months with durvalumab versus 5.6 months with placebo (P < 0.001; hazard ratio [HR] 0.52, 95% confidence interval [CI] 0.42–0.65). The 12-month PFS rate was 55.9% and 35.3% in the durvalumab and placebo group, respectively. The 18-month PFS rate was 44.2% and 27% in the durvalumab and placebo group, respectively. The PFS results were consistent across all subgroups. The PFS benefit was observed regardless of PD-L1 expression. The median time to death or metastasis was 23.2 months in the durvalumab group versus 14.6 months with placebo (HR 0.52; 95% CI 0.39–0.69). The objective response rate was significantly higher in the durvalumab group (28.4% vs. 16%, P < 0.001). The median duration of response was longer with durvalumab. Of the patients who responded to durvalumab, 73% had ongoing response at 18 months compared with 47% in the placebo group. OS was not assessed at this interm analysis.
Adverse events (AE) of any grade occurred in over approximately 95% in both groups. Grade 3 or 4 AE occurred in 29.9% in the durvalumab group and 26.1% in the placebo group. The most common grade 3 or 4 AE was pneumonia, occurring in about 4% of patients in each group. More patients in the durvalumab group discontinued treatment (15.4% vs 9.8%). Death due to an AE occurred in 4.4% of the durvalumab group and 5.6% of the placebo group. The most frequent AE leading to discontinuation was pneumonitis or radiation pneumonitis and pneumonia. Pneumonitis or radiation pneumonitis occurred in 33.9% (3.4% grade 3 or 4) and 24.8% (2.6% grade 3 or 4) of the durvalumab and placebo groups, respectively. Immune-mediated AE of any grade were more common in the duvalumab group occurring in 24% of patients (vs. 8% in placebo). Of these, 14% of patients in the durvalumab group required glucocorticoids compared with 4.3% in the placebo group. The most AE of interest was diarrhea, which occurred in 18% of the patients in both groups.
Conclusion. The addition of consolidative durvalumab following completion of concurrent chemoradiotherapy in patients with stage III, locally advanced NSCLC significantly improved PFS without a significant increase in treatment-related adverse events.
Commentary
Pre-clinical evidence has suggested that chemotherapy and radiation therapy may lead to upregulation of PD-L1 expression by tumor cells leading to increased PD-L1 mediated T cell apoptosis [1,2]. Given prior studies documenting PD-L1 expression as a predictive biomarker for response to durvalumab, the authors of the current trial hypothesized that the addition of durvalumab after chemoradiotherapy would provide clinical benefit likely mediated by upregulation of PD-L1. The results from this pre-planned interim analysis show a significant improvement in progression-free survival with the addition of durvalumab with a 48% decrease in the risk of progression. This benefit was noted across all patient subgroups. In addition, responses to durvalumab were durable, with 72% of the patients who responded having an ongoing response at 18 months. Interestingly, the response to durvalumab was independent of PD-L1 expression, which is in contrast to previous studies showing PD-L1 expression to be a good biomarker for durvalumab response [3].
The results of the PACIFIC trial represent a clinically meaningful benefit and suggests an excellent option for patients with unresectable stage III NSCLC. One important point to highlight is that the addition of durvalumab was well tolerated and did not appear to significantly increase the rate of severe adverse events. Of particular interest is the similar rates of grade 3 or 4 pneumonitis, which appeared to be around 3% for each group. Overall survival data remain immature at the time of this analysis; however, given the acceptable toxicity profile and improved PFS this combination should be considered for these patients in clinical practice. Ongoing trials are underway to evaluate the role of single-agent durvalumab in the front-line setting for NSCLC.
Applications for Clinical Practice
In patients with unresectable stage III NSCLC who have no evidence of disease progression following completion of chemoradiotherapy, the addition of durvalumab provided a significant and clinically meaningful improvement in progression-free survival without an increase in serious adverse events. While the overall survival data is immature, the 48% improvement in progression-free survival supports the incorporation of durvalumab into standard practice in this patient population.
—Daniel Isaac, DO, MS
1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.
2. Zhang P, Su DM, Liang M, Fu J. Chemopreventive agents induce programmed death-1-ligand 1 (PD-L1) surface expression in breast cancer cells and promote PD-L1 mediated T cell apoptosis. Mol Immun 2008;45:1470–6.
3. Antonia SJ, Brahmer JR, Khleif S, et al. Phase ½ [What should this be? 3?]study of the safety and clinical activity of durvalumab in patients with non-small cell lung cancer (NSCLC). Presented at the 41st European Society for Medical Oncology Annual Meeting, Copenhagen, October 7–11 2016.
Study Overview
Objective. To evaluate the efficacy of the PD-L1 antibody durvalumab in the treatment of patients with unresectable stage III non-small-cell lung cancer (NSCLC) following completion of standard chemoradiotherapy.
Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.
Setting and participants. A total of 709 patients underwent randomization between May 2014 and April 2016. Eligible patients had histologically proven stage III, locally advanced and unresectable NSCLC with no evidence of disease progression following chemoradiotherapy. The enrolled patients had received at least 2 cycles of platinum-based chemotherapy concurrently with definitive radiation therapy (54 Gy to 66 Gy). Initially, patients were randomized within 2 weeks of completing radiation; however, the protocol was amended to allow randomization up to 42 days following completion of therapy. Patients were not eligible if they had previous exposure to anti-PD-1 or PD-L1 antibodies or active or prior autoimmune disease in the last 2 years. All patients were required to have an WHO performance status of 0 or 1. The patients were stratified at randomization by age (< 65 or > 65 years), sex and smoking status. Enrollment was not restricted to level of PD-L1 expression.
Intervention. Patients were randomized in a 2:1 ratio to receive consolidation durvalumab 10 mg/kg or placebo every 2 weeks for up to 12 months. The intervention was discontinued if there was evidence of confirmed disease progression, treatment with an alternative anticancer therapy, toxicity or patient preference. The response to treatment was assessed every 8 weeks for the first year and then every 12 weeks thereafter.
Main outcome measures. The primary endpoints of the study were progression-free survival (PFS) by blinded independent review and overall survival (OS). Secondary endpoints were the percentage of patients alive without disease progression at 12 and 18 months, objective response rate, duration of response, safety, and time to death or metastasis. Patients were given the option to provide archived tumor specimens for PD-L1 testing.
Results. The baseline characteristics were balanced. The median age at enrollment was 64 years and 91% of the patients were current or former smokers. The vast majority of patients (> 99% in both groups) received concurrent chemoradiotherapy. The response to initial concurrent therapy was similar in both groups with complete response rates of 1.9% and 3% in the durvalumab and placebo groups, respectively, and partial response rates of 48.7% and 46.8%. Archived tumor samples showed ≥ 25% PD-L1 expression in 22.3% of patients (24% in durvalumab group versus 18.6% in placebo group) and < 25% in 41% of patients (39.3%% in durvalumab group versus 44.3% in placebo group). PD-L1 status was unknown in 36.7% of the enrolled patients. Of note, 6% of patients enrolled had EGFR mutations.
After a median follow-up of 14.5 months, the median PFS was 16.8 months with durvalumab versus 5.6 months with placebo (P < 0.001; hazard ratio [HR] 0.52, 95% confidence interval [CI] 0.42–0.65). The 12-month PFS rate was 55.9% and 35.3% in the durvalumab and placebo group, respectively. The 18-month PFS rate was 44.2% and 27% in the durvalumab and placebo group, respectively. The PFS results were consistent across all subgroups. The PFS benefit was observed regardless of PD-L1 expression. The median time to death or metastasis was 23.2 months in the durvalumab group versus 14.6 months with placebo (HR 0.52; 95% CI 0.39–0.69). The objective response rate was significantly higher in the durvalumab group (28.4% vs. 16%, P < 0.001). The median duration of response was longer with durvalumab. Of the patients who responded to durvalumab, 73% had ongoing response at 18 months compared with 47% in the placebo group. OS was not assessed at this interm analysis.
Adverse events (AE) of any grade occurred in over approximately 95% in both groups. Grade 3 or 4 AE occurred in 29.9% in the durvalumab group and 26.1% in the placebo group. The most common grade 3 or 4 AE was pneumonia, occurring in about 4% of patients in each group. More patients in the durvalumab group discontinued treatment (15.4% vs 9.8%). Death due to an AE occurred in 4.4% of the durvalumab group and 5.6% of the placebo group. The most frequent AE leading to discontinuation was pneumonitis or radiation pneumonitis and pneumonia. Pneumonitis or radiation pneumonitis occurred in 33.9% (3.4% grade 3 or 4) and 24.8% (2.6% grade 3 or 4) of the durvalumab and placebo groups, respectively. Immune-mediated AE of any grade were more common in the duvalumab group occurring in 24% of patients (vs. 8% in placebo). Of these, 14% of patients in the durvalumab group required glucocorticoids compared with 4.3% in the placebo group. The most AE of interest was diarrhea, which occurred in 18% of the patients in both groups.
Conclusion. The addition of consolidative durvalumab following completion of concurrent chemoradiotherapy in patients with stage III, locally advanced NSCLC significantly improved PFS without a significant increase in treatment-related adverse events.
Commentary
Pre-clinical evidence has suggested that chemotherapy and radiation therapy may lead to upregulation of PD-L1 expression by tumor cells leading to increased PD-L1 mediated T cell apoptosis [1,2]. Given prior studies documenting PD-L1 expression as a predictive biomarker for response to durvalumab, the authors of the current trial hypothesized that the addition of durvalumab after chemoradiotherapy would provide clinical benefit likely mediated by upregulation of PD-L1. The results from this pre-planned interim analysis show a significant improvement in progression-free survival with the addition of durvalumab with a 48% decrease in the risk of progression. This benefit was noted across all patient subgroups. In addition, responses to durvalumab were durable, with 72% of the patients who responded having an ongoing response at 18 months. Interestingly, the response to durvalumab was independent of PD-L1 expression, which is in contrast to previous studies showing PD-L1 expression to be a good biomarker for durvalumab response [3].
The results of the PACIFIC trial represent a clinically meaningful benefit and suggests an excellent option for patients with unresectable stage III NSCLC. One important point to highlight is that the addition of durvalumab was well tolerated and did not appear to significantly increase the rate of severe adverse events. Of particular interest is the similar rates of grade 3 or 4 pneumonitis, which appeared to be around 3% for each group. Overall survival data remain immature at the time of this analysis; however, given the acceptable toxicity profile and improved PFS this combination should be considered for these patients in clinical practice. Ongoing trials are underway to evaluate the role of single-agent durvalumab in the front-line setting for NSCLC.
Applications for Clinical Practice
In patients with unresectable stage III NSCLC who have no evidence of disease progression following completion of chemoradiotherapy, the addition of durvalumab provided a significant and clinically meaningful improvement in progression-free survival without an increase in serious adverse events. While the overall survival data is immature, the 48% improvement in progression-free survival supports the incorporation of durvalumab into standard practice in this patient population.
—Daniel Isaac, DO, MS
Study Overview
Objective. To evaluate the efficacy of the PD-L1 antibody durvalumab in the treatment of patients with unresectable stage III non-small-cell lung cancer (NSCLC) following completion of standard chemoradiotherapy.
Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.
Setting and participants. A total of 709 patients underwent randomization between May 2014 and April 2016. Eligible patients had histologically proven stage III, locally advanced and unresectable NSCLC with no evidence of disease progression following chemoradiotherapy. The enrolled patients had received at least 2 cycles of platinum-based chemotherapy concurrently with definitive radiation therapy (54 Gy to 66 Gy). Initially, patients were randomized within 2 weeks of completing radiation; however, the protocol was amended to allow randomization up to 42 days following completion of therapy. Patients were not eligible if they had previous exposure to anti-PD-1 or PD-L1 antibodies or active or prior autoimmune disease in the last 2 years. All patients were required to have an WHO performance status of 0 or 1. The patients were stratified at randomization by age (< 65 or > 65 years), sex and smoking status. Enrollment was not restricted to level of PD-L1 expression.
Intervention. Patients were randomized in a 2:1 ratio to receive consolidation durvalumab 10 mg/kg or placebo every 2 weeks for up to 12 months. The intervention was discontinued if there was evidence of confirmed disease progression, treatment with an alternative anticancer therapy, toxicity or patient preference. The response to treatment was assessed every 8 weeks for the first year and then every 12 weeks thereafter.
Main outcome measures. The primary endpoints of the study were progression-free survival (PFS) by blinded independent review and overall survival (OS). Secondary endpoints were the percentage of patients alive without disease progression at 12 and 18 months, objective response rate, duration of response, safety, and time to death or metastasis. Patients were given the option to provide archived tumor specimens for PD-L1 testing.
Results. The baseline characteristics were balanced. The median age at enrollment was 64 years and 91% of the patients were current or former smokers. The vast majority of patients (> 99% in both groups) received concurrent chemoradiotherapy. The response to initial concurrent therapy was similar in both groups with complete response rates of 1.9% and 3% in the durvalumab and placebo groups, respectively, and partial response rates of 48.7% and 46.8%. Archived tumor samples showed ≥ 25% PD-L1 expression in 22.3% of patients (24% in durvalumab group versus 18.6% in placebo group) and < 25% in 41% of patients (39.3%% in durvalumab group versus 44.3% in placebo group). PD-L1 status was unknown in 36.7% of the enrolled patients. Of note, 6% of patients enrolled had EGFR mutations.
After a median follow-up of 14.5 months, the median PFS was 16.8 months with durvalumab versus 5.6 months with placebo (P < 0.001; hazard ratio [HR] 0.52, 95% confidence interval [CI] 0.42–0.65). The 12-month PFS rate was 55.9% and 35.3% in the durvalumab and placebo group, respectively. The 18-month PFS rate was 44.2% and 27% in the durvalumab and placebo group, respectively. The PFS results were consistent across all subgroups. The PFS benefit was observed regardless of PD-L1 expression. The median time to death or metastasis was 23.2 months in the durvalumab group versus 14.6 months with placebo (HR 0.52; 95% CI 0.39–0.69). The objective response rate was significantly higher in the durvalumab group (28.4% vs. 16%, P < 0.001). The median duration of response was longer with durvalumab. Of the patients who responded to durvalumab, 73% had ongoing response at 18 months compared with 47% in the placebo group. OS was not assessed at this interm analysis.
Adverse events (AE) of any grade occurred in over approximately 95% in both groups. Grade 3 or 4 AE occurred in 29.9% in the durvalumab group and 26.1% in the placebo group. The most common grade 3 or 4 AE was pneumonia, occurring in about 4% of patients in each group. More patients in the durvalumab group discontinued treatment (15.4% vs 9.8%). Death due to an AE occurred in 4.4% of the durvalumab group and 5.6% of the placebo group. The most frequent AE leading to discontinuation was pneumonitis or radiation pneumonitis and pneumonia. Pneumonitis or radiation pneumonitis occurred in 33.9% (3.4% grade 3 or 4) and 24.8% (2.6% grade 3 or 4) of the durvalumab and placebo groups, respectively. Immune-mediated AE of any grade were more common in the duvalumab group occurring in 24% of patients (vs. 8% in placebo). Of these, 14% of patients in the durvalumab group required glucocorticoids compared with 4.3% in the placebo group. The most AE of interest was diarrhea, which occurred in 18% of the patients in both groups.
Conclusion. The addition of consolidative durvalumab following completion of concurrent chemoradiotherapy in patients with stage III, locally advanced NSCLC significantly improved PFS without a significant increase in treatment-related adverse events.
Commentary
Pre-clinical evidence has suggested that chemotherapy and radiation therapy may lead to upregulation of PD-L1 expression by tumor cells leading to increased PD-L1 mediated T cell apoptosis [1,2]. Given prior studies documenting PD-L1 expression as a predictive biomarker for response to durvalumab, the authors of the current trial hypothesized that the addition of durvalumab after chemoradiotherapy would provide clinical benefit likely mediated by upregulation of PD-L1. The results from this pre-planned interim analysis show a significant improvement in progression-free survival with the addition of durvalumab with a 48% decrease in the risk of progression. This benefit was noted across all patient subgroups. In addition, responses to durvalumab were durable, with 72% of the patients who responded having an ongoing response at 18 months. Interestingly, the response to durvalumab was independent of PD-L1 expression, which is in contrast to previous studies showing PD-L1 expression to be a good biomarker for durvalumab response [3].
The results of the PACIFIC trial represent a clinically meaningful benefit and suggests an excellent option for patients with unresectable stage III NSCLC. One important point to highlight is that the addition of durvalumab was well tolerated and did not appear to significantly increase the rate of severe adverse events. Of particular interest is the similar rates of grade 3 or 4 pneumonitis, which appeared to be around 3% for each group. Overall survival data remain immature at the time of this analysis; however, given the acceptable toxicity profile and improved PFS this combination should be considered for these patients in clinical practice. Ongoing trials are underway to evaluate the role of single-agent durvalumab in the front-line setting for NSCLC.
Applications for Clinical Practice
In patients with unresectable stage III NSCLC who have no evidence of disease progression following completion of chemoradiotherapy, the addition of durvalumab provided a significant and clinically meaningful improvement in progression-free survival without an increase in serious adverse events. While the overall survival data is immature, the 48% improvement in progression-free survival supports the incorporation of durvalumab into standard practice in this patient population.
—Daniel Isaac, DO, MS
1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.
2. Zhang P, Su DM, Liang M, Fu J. Chemopreventive agents induce programmed death-1-ligand 1 (PD-L1) surface expression in breast cancer cells and promote PD-L1 mediated T cell apoptosis. Mol Immun 2008;45:1470–6.
3. Antonia SJ, Brahmer JR, Khleif S, et al. Phase ½ [What should this be? 3?]study of the safety and clinical activity of durvalumab in patients with non-small cell lung cancer (NSCLC). Presented at the 41st European Society for Medical Oncology Annual Meeting, Copenhagen, October 7–11 2016.
1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.
2. Zhang P, Su DM, Liang M, Fu J. Chemopreventive agents induce programmed death-1-ligand 1 (PD-L1) surface expression in breast cancer cells and promote PD-L1 mediated T cell apoptosis. Mol Immun 2008;45:1470–6.
3. Antonia SJ, Brahmer JR, Khleif S, et al. Phase ½ [What should this be? 3?]study of the safety and clinical activity of durvalumab in patients with non-small cell lung cancer (NSCLC). Presented at the 41st European Society for Medical Oncology Annual Meeting, Copenhagen, October 7–11 2016.
Mepolizumab for Eosinophilic Chronic Obstructive Pulmonary Disease
Study Overview
Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.
Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).
Setting and participants. Participants were recruited from over 15 countries in over 100 investigative sites. Inclusion criteria were adults (40 years or older) with a diagnosis of COPD for at least 1 year with: airflow limitation (FEV1/FVC < 0.7); some bronchodilator reversibility (post-bronchodilator FEV1 > 20% and ≤ 80% of predicted values); current COPD therapy for at least 3 months prior to enrollment (a high-dose inhaled corticosteroid, ICS, with at least 2 other classes of medications, to obtain “triple therapy”); and a high risk of exacerbations (at least 1 severe [requiring hospitalization] or 2 moderate [treatment with systemic corticosteroids and/or antibiotics] exacerbations in past year).
Notable exclusion criteria were patients with diagnoses of asthma in never-smokers, alpha-1 antitrypsin deficiency, recent exacerbations (in past month), lung volume reduction surgery (in past year), eosinophilic or parasitic diseases, or those with recent monoclonal antibody treatment. Patients with the asthma-COPD overlap syndrome were included only if they had a history of smoking and met the COPD inclusion criteria listed above.
Intervention. The treatment period lasted for a total of 52 weeks, with an additional 8 weeks of follow-up. Patients were randomized 1:1 to placebo or low-dose medication (100 mg) using permuted-block randomization in the METREX study regardless of eosinophil count (but they were stratified for a modified intention-to-treat analysis at screening into either low eosinophilic count [< 150 cells/uL] or high [≥ 150 cells/uL]). In the METREO study, patients were randomized 1:1:1 to placebo, low-dose (100 mg), or high-dose (300 mg) medication only if blood eosinophilia was present (≥ 150 cells/uL at screening or ≥ 300 cells/uL in past 12 months). Investigators and patients were blinded to presence of drug or placebo. Sample size calculations indicated that in order to provide a 90% power to detect a 30% decrease in the rate of exacerbations in METREX and 35% decrease in METREO, a total of 800 patients and 660 patients would need to be enrolled in METREX and METREO respectively. Both studies met their enrollment quota.
Main outcome measures. The primary outcome was the annual rate of exacerbations that were either moderate (requiring systemic corticosteroids and/or antibiotics) or severe (requiring hospitalization). Secondary outcomes included the time to first moderate/severe exacerbation, change from baseline in the COPD Assessment Test (CAT) and St. George’s Respiratory Questionnaire (SGRQ), and change from baseline in blood eosinophil count, FEV1, and FVC. Safety and adverse events endpoints were also assessed.
A modified intention-to-treat analysis was performed overall and in the METREX study stratified on eosinophilic count at screening; all patients who underwent randomization and received at least one dose of medication or placebo were included in that respective group. Multiple comparisons were accounted for using the Benjamini-Hochberg Test, exacerbations were assumed to follow a negative binomial distribution, and Cox proportional-hazards was used to model the relationship between covariates of interest and the primary outcome.
Main results. In the METREX study, 1161 patients were enrolled and 836 underwent randomization and received at least 1 dose of medication or placebo. In METREO, 1071 patients were enrolled and 674 underwent randomization and received at least one dose of medication or placebo. In both studies the patients in the medication and placebo groups were well balanced at baseline across demographics (age, gender, smoking history, duration of COPD) and pulmonary function (FEV1, FVC, FEV1/FVC, CAT, SGRQ). In METREX, a total of 462 (55%) patients had an eosinophilic phenotype and 374 (45%) did not.
There was no difference between groups in the primary endpoint of annual exacerbation rate in METREO (1.49/yr in placebo vs. 1.19/yr in low-dose and 1.27/yr in high-dose mepolizumab, rate ratio of high-dose to placebo 0.86, 95% confidence interval [CI] 0.7–1.05, P = 0.14). There was no difference in the primary outcome in the overall intention-to-treat analysis in the METREX study (1.49/yr in mepolizumab vs. 1.52/yr in placebo, P > 0.99). Only when analyzing the high eosinophilic phenotype in the stratified intention-to-treat METREX group was there a significant difference in the primary outcome (1.41/yr in mepolizumab vs. 1.71/yr in placebo, P = 0.04, rate ratio 0.82, 95% CI 0.68–0.98).
There were no significant differences in any secondary endpoint in the METREO study. In the METREX study, mepolizumab treatment resulted in a significantly longer time to first exacerbation (192 days vs. 141 days, hazard ratio 0.75, 95% CI 0.60–0.94, P = 0.04) but no difference in the change in SGRQ (–2.8 vs. –3.0, P > 0.99) or CAT score (–0.8 vs. 0, P > 0.99). There was no significant difference in any measures of pulmonary function between the treatment and placebo groups (FEV1, FVC, FEV1/FVC). As expected, there was a significant decrease in peripheral blood eosinophil count in both studies in the medication arm. The incidence of adverse events and safety endpoints were similar between the trial groups in METREX and METREO.
Conclusions. In this pair of placebo-controlled double-blind randomized parallel studies, there was a significant decline in annual exacerbation rate in patients with an eosinophilic phenotype treated with mepolizumab in a stratified intention-to-treat analysis of one of two parallel studies (METREX). However, there was no significant difference in the primary outcome of the other parallel study (METREO), which included only those patients with an eosinophilic phenotype. Additionally, there was no significant difference in any secondary endpoints in either study. The medication was generally safe and well tolerated.
Commentary
Mepolizumab is a humanized monoclonal antibody that targets and blocks interleukin-5, a key mediator of eosinophilic activity. Due to its ability to decrease eosinophil number and function, it is currently approved as a therapy for severe asthma with an eosinophilic phenotype [1]. While asthma and COPD have historically been thought of as separate entities with distinct pathophysiologic mechanisms, recent evidence has suggested that a subset of COPD patients experience significant eosinophilic inflammation. This group may behave more like asthmatic patients, and may have a different response to medications such as inhaled corticosteroids, but the role of eosinophils to guide prognostication and treatment in this group is still unclear [2,3].
In this study, Pavord and colleagues investigated the use of the anti-IL5 drug mepolizumab in COPD patients at risk of exacerbations who demonstrated an eosinophilic phenotype. The physiologic rationale for the study was that eosinophilic inflammation is thought to be a driver of exacerbations in COPD patients with an eosinophilic phenotype, and therefore a decrease in eosinophilic number and function should result in a decrease in exacerbations. The authors conducted a well-designed placebo-controlled double-blind study with a clearly defined endpoint, met their enrollment goals as determined by their power calculations, and used COPD patients at high risk of exacerbations to enrich their study.
There was no difference in the primary outcome in the METREO arm of the study, which included patients with baseline eosinophilia (> 150 cells/uL) or in the overall intention-to-treat analysis in METREX (which did not screen patients on baseline eosinophil count). Only when stratified on baseline eosinophil count in the METREX study was a significant treatment effect found, where patients with high eosinophil count at baseline (> 150 cells/uL) had a decreased risk of exacerbations when treated with mepolizumab. Notably there was no difference in any secondary outcome in METREO or in METREX aside from a longer time to first exacerbation in METREX in the mepolizumab group. The authors use this data to conclude that mepolizumab treatment results in a lower rate of exacerbations and a longer time to the first exacerbation in COPD patients with an eosinophilic phenotype, and the extent of the treatment effect is related to blood eosinophil counts.
The authors conducted a well-designed and rigorous study, and used robust and appropriate statistical analysis; however, significant questions remain regarding their conclusions. The primary concern is the role of mepolizumab in the treatment of COPD patients to decrease exacerbations may be overstated. When including only those with baseline eosinophilia in the METREO arm, there was no significant difference between placebo and low or high dose of mepolizumab; however, there was an appropriate and expected decrease in blood eosinophils, indicating the medication worked as intended. In the overall intention-to-treat analysis in the METREX arm, there was no difference between mepolizumab and placebo, and only in the analysis of METREX stratified to eosinophil count was there a significant difference (with an upper confidence interval rate ratio [0.98] approaching unity).
Additionally there was no significant difference between the 2 groups across a number of clinically important secondary endpoints, including pulmonary function measurements and symptomatic scores. Only the time to exacerbation was significantly longer in the mepolizumab group in METREX.
Taken together, this calls into question the conclusion that a decrease in eosinophil counts due to mepolizumab has resulted in a lower rate of exacerbations, particularly as a higher dose of mepolizumab did not result in a stronger effect. The lack of difference between groups in secondary endpoints is also concerning, as those would be expected to improve with a decrease in exacerbations [4,5]. As the authors point out, their evidence suggests that eosinophils may be an important biomarker in COPD and may aid in the therapeutic decision-making process. However, given the inconsistencies in the data as noted above, it would be difficult to rely on the evidence from this study alone to support their conclusion regarding the clinical utility of mepolizumab in COPD.
The authors discuss a number of limitations that may account for the lack of consistent effect seen in this study. Aside from the standard limitations applicable to any clinical trial, they note the potential confounding effect of previous oral glucocorticoid therapy in reducing eosinophil counts. This may have masked the eosinophilic phenotype in some study patients, leading to the attenuated effect of mepolizumab seen in this study.
The authors also note that information that might be potentially valuable for identifying treatment responders, such as a history of allergies and atopy, were not available. Inclusion of those patients may be helpful in enriching the trial with potential treatment-responders, and future studies may benefit from focusing on COPD patients with a more atopic phenotype who more closely resemble those with the asthma-COPD overlap syndrome.
A final limitation to discuss is the focus on blood eosinophilic counts. Due to the difficulty of measuring sputum eosinophils, and the reasonable degree of correlation between blood and sputum in asthmatic patients, blood eosinophils have largely supplanted sputum eosinophils as markers of TH2 CD4 T-cell activity in the pulmonary system [6]. This substitution is also used in the COPD population, however, due to the differences in pathophysiology it is unclear if eosinophils in asthmatic patients behave similarly to those in COPD patients [7]. Additionally, the cutoff of 150 cells/uL has been obtained primarily from sub-group analysis of previous studies on COPD patients, but it is unclear if this cutoff truly reflects elevated sputum eosinophilia. While there is likely some degree of correlation between blood and sputum eosinophilia in COPD patients, a lack of significant effect seen in this study may be due to an incorrect cutoff for elevated eosinophilia and a reliance on blood eosinophils over sputum counts. Further studies utilizing sputum eosinophils may be of value in addressing this limitation.
Applications for Clinical Practice
In this study, Pavord and colleagues found a potential benefit of mepolizumab treatment for reducing exacerbations in COPD patients with an eosinophilic phenotype. The conflicting results regarding the underlying physiology and the weak treatment effect suggest this medication may not be ready for use in clinical practice without additional supporting evidence. From a practical standpoint, the high cost of medication (~$2500 per month) and marginal benefit of treatment imply that treatment with mepolizumab in COPD patients may not be cost-effective, and even treatment in individual patients on a trial basis should be discouraged until additional supporting data becomes available. Of primary concern are the optimal selection of COPD patients that will achieve benefit with mepolizumab treatment, and the optimal dose of medication to achieve that benefit. The results presented here do not satisfactorily answer these questions, and additional studies are required.
—Arun Jose, MD, The George Washington University, Washington, DC
1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.
2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.
3. Roche N, Chapman KR, Vogelmeier CF, et al. Blood eosinophils and response to maintenance chronic obstructive pulmonary disease treatment. Data from the FLAME trial. Am J Respir Crit Care Med 2017;195:1189–97.
4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.
5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.
6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.
7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.
Study Overview
Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.
Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).
Setting and participants. Participants were recruited from over 15 countries in over 100 investigative sites. Inclusion criteria were adults (40 years or older) with a diagnosis of COPD for at least 1 year with: airflow limitation (FEV1/FVC < 0.7); some bronchodilator reversibility (post-bronchodilator FEV1 > 20% and ≤ 80% of predicted values); current COPD therapy for at least 3 months prior to enrollment (a high-dose inhaled corticosteroid, ICS, with at least 2 other classes of medications, to obtain “triple therapy”); and a high risk of exacerbations (at least 1 severe [requiring hospitalization] or 2 moderate [treatment with systemic corticosteroids and/or antibiotics] exacerbations in past year).
Notable exclusion criteria were patients with diagnoses of asthma in never-smokers, alpha-1 antitrypsin deficiency, recent exacerbations (in past month), lung volume reduction surgery (in past year), eosinophilic or parasitic diseases, or those with recent monoclonal antibody treatment. Patients with the asthma-COPD overlap syndrome were included only if they had a history of smoking and met the COPD inclusion criteria listed above.
Intervention. The treatment period lasted for a total of 52 weeks, with an additional 8 weeks of follow-up. Patients were randomized 1:1 to placebo or low-dose medication (100 mg) using permuted-block randomization in the METREX study regardless of eosinophil count (but they were stratified for a modified intention-to-treat analysis at screening into either low eosinophilic count [< 150 cells/uL] or high [≥ 150 cells/uL]). In the METREO study, patients were randomized 1:1:1 to placebo, low-dose (100 mg), or high-dose (300 mg) medication only if blood eosinophilia was present (≥ 150 cells/uL at screening or ≥ 300 cells/uL in past 12 months). Investigators and patients were blinded to presence of drug or placebo. Sample size calculations indicated that in order to provide a 90% power to detect a 30% decrease in the rate of exacerbations in METREX and 35% decrease in METREO, a total of 800 patients and 660 patients would need to be enrolled in METREX and METREO respectively. Both studies met their enrollment quota.
Main outcome measures. The primary outcome was the annual rate of exacerbations that were either moderate (requiring systemic corticosteroids and/or antibiotics) or severe (requiring hospitalization). Secondary outcomes included the time to first moderate/severe exacerbation, change from baseline in the COPD Assessment Test (CAT) and St. George’s Respiratory Questionnaire (SGRQ), and change from baseline in blood eosinophil count, FEV1, and FVC. Safety and adverse events endpoints were also assessed.
A modified intention-to-treat analysis was performed overall and in the METREX study stratified on eosinophilic count at screening; all patients who underwent randomization and received at least one dose of medication or placebo were included in that respective group. Multiple comparisons were accounted for using the Benjamini-Hochberg Test, exacerbations were assumed to follow a negative binomial distribution, and Cox proportional-hazards was used to model the relationship between covariates of interest and the primary outcome.
Main results. In the METREX study, 1161 patients were enrolled and 836 underwent randomization and received at least 1 dose of medication or placebo. In METREO, 1071 patients were enrolled and 674 underwent randomization and received at least one dose of medication or placebo. In both studies the patients in the medication and placebo groups were well balanced at baseline across demographics (age, gender, smoking history, duration of COPD) and pulmonary function (FEV1, FVC, FEV1/FVC, CAT, SGRQ). In METREX, a total of 462 (55%) patients had an eosinophilic phenotype and 374 (45%) did not.
There was no difference between groups in the primary endpoint of annual exacerbation rate in METREO (1.49/yr in placebo vs. 1.19/yr in low-dose and 1.27/yr in high-dose mepolizumab, rate ratio of high-dose to placebo 0.86, 95% confidence interval [CI] 0.7–1.05, P = 0.14). There was no difference in the primary outcome in the overall intention-to-treat analysis in the METREX study (1.49/yr in mepolizumab vs. 1.52/yr in placebo, P > 0.99). Only when analyzing the high eosinophilic phenotype in the stratified intention-to-treat METREX group was there a significant difference in the primary outcome (1.41/yr in mepolizumab vs. 1.71/yr in placebo, P = 0.04, rate ratio 0.82, 95% CI 0.68–0.98).
There were no significant differences in any secondary endpoint in the METREO study. In the METREX study, mepolizumab treatment resulted in a significantly longer time to first exacerbation (192 days vs. 141 days, hazard ratio 0.75, 95% CI 0.60–0.94, P = 0.04) but no difference in the change in SGRQ (–2.8 vs. –3.0, P > 0.99) or CAT score (–0.8 vs. 0, P > 0.99). There was no significant difference in any measures of pulmonary function between the treatment and placebo groups (FEV1, FVC, FEV1/FVC). As expected, there was a significant decrease in peripheral blood eosinophil count in both studies in the medication arm. The incidence of adverse events and safety endpoints were similar between the trial groups in METREX and METREO.
Conclusions. In this pair of placebo-controlled double-blind randomized parallel studies, there was a significant decline in annual exacerbation rate in patients with an eosinophilic phenotype treated with mepolizumab in a stratified intention-to-treat analysis of one of two parallel studies (METREX). However, there was no significant difference in the primary outcome of the other parallel study (METREO), which included only those patients with an eosinophilic phenotype. Additionally, there was no significant difference in any secondary endpoints in either study. The medication was generally safe and well tolerated.
Commentary
Mepolizumab is a humanized monoclonal antibody that targets and blocks interleukin-5, a key mediator of eosinophilic activity. Due to its ability to decrease eosinophil number and function, it is currently approved as a therapy for severe asthma with an eosinophilic phenotype [1]. While asthma and COPD have historically been thought of as separate entities with distinct pathophysiologic mechanisms, recent evidence has suggested that a subset of COPD patients experience significant eosinophilic inflammation. This group may behave more like asthmatic patients, and may have a different response to medications such as inhaled corticosteroids, but the role of eosinophils to guide prognostication and treatment in this group is still unclear [2,3].
In this study, Pavord and colleagues investigated the use of the anti-IL5 drug mepolizumab in COPD patients at risk of exacerbations who demonstrated an eosinophilic phenotype. The physiologic rationale for the study was that eosinophilic inflammation is thought to be a driver of exacerbations in COPD patients with an eosinophilic phenotype, and therefore a decrease in eosinophilic number and function should result in a decrease in exacerbations. The authors conducted a well-designed placebo-controlled double-blind study with a clearly defined endpoint, met their enrollment goals as determined by their power calculations, and used COPD patients at high risk of exacerbations to enrich their study.
There was no difference in the primary outcome in the METREO arm of the study, which included patients with baseline eosinophilia (> 150 cells/uL) or in the overall intention-to-treat analysis in METREX (which did not screen patients on baseline eosinophil count). Only when stratified on baseline eosinophil count in the METREX study was a significant treatment effect found, where patients with high eosinophil count at baseline (> 150 cells/uL) had a decreased risk of exacerbations when treated with mepolizumab. Notably there was no difference in any secondary outcome in METREO or in METREX aside from a longer time to first exacerbation in METREX in the mepolizumab group. The authors use this data to conclude that mepolizumab treatment results in a lower rate of exacerbations and a longer time to the first exacerbation in COPD patients with an eosinophilic phenotype, and the extent of the treatment effect is related to blood eosinophil counts.
The authors conducted a well-designed and rigorous study, and used robust and appropriate statistical analysis; however, significant questions remain regarding their conclusions. The primary concern is the role of mepolizumab in the treatment of COPD patients to decrease exacerbations may be overstated. When including only those with baseline eosinophilia in the METREO arm, there was no significant difference between placebo and low or high dose of mepolizumab; however, there was an appropriate and expected decrease in blood eosinophils, indicating the medication worked as intended. In the overall intention-to-treat analysis in the METREX arm, there was no difference between mepolizumab and placebo, and only in the analysis of METREX stratified to eosinophil count was there a significant difference (with an upper confidence interval rate ratio [0.98] approaching unity).
Additionally there was no significant difference between the 2 groups across a number of clinically important secondary endpoints, including pulmonary function measurements and symptomatic scores. Only the time to exacerbation was significantly longer in the mepolizumab group in METREX.
Taken together, this calls into question the conclusion that a decrease in eosinophil counts due to mepolizumab has resulted in a lower rate of exacerbations, particularly as a higher dose of mepolizumab did not result in a stronger effect. The lack of difference between groups in secondary endpoints is also concerning, as those would be expected to improve with a decrease in exacerbations [4,5]. As the authors point out, their evidence suggests that eosinophils may be an important biomarker in COPD and may aid in the therapeutic decision-making process. However, given the inconsistencies in the data as noted above, it would be difficult to rely on the evidence from this study alone to support their conclusion regarding the clinical utility of mepolizumab in COPD.
The authors discuss a number of limitations that may account for the lack of consistent effect seen in this study. Aside from the standard limitations applicable to any clinical trial, they note the potential confounding effect of previous oral glucocorticoid therapy in reducing eosinophil counts. This may have masked the eosinophilic phenotype in some study patients, leading to the attenuated effect of mepolizumab seen in this study.
The authors also note that information that might be potentially valuable for identifying treatment responders, such as a history of allergies and atopy, were not available. Inclusion of those patients may be helpful in enriching the trial with potential treatment-responders, and future studies may benefit from focusing on COPD patients with a more atopic phenotype who more closely resemble those with the asthma-COPD overlap syndrome.
A final limitation to discuss is the focus on blood eosinophilic counts. Due to the difficulty of measuring sputum eosinophils, and the reasonable degree of correlation between blood and sputum in asthmatic patients, blood eosinophils have largely supplanted sputum eosinophils as markers of TH2 CD4 T-cell activity in the pulmonary system [6]. This substitution is also used in the COPD population, however, due to the differences in pathophysiology it is unclear if eosinophils in asthmatic patients behave similarly to those in COPD patients [7]. Additionally, the cutoff of 150 cells/uL has been obtained primarily from sub-group analysis of previous studies on COPD patients, but it is unclear if this cutoff truly reflects elevated sputum eosinophilia. While there is likely some degree of correlation between blood and sputum eosinophilia in COPD patients, a lack of significant effect seen in this study may be due to an incorrect cutoff for elevated eosinophilia and a reliance on blood eosinophils over sputum counts. Further studies utilizing sputum eosinophils may be of value in addressing this limitation.
Applications for Clinical Practice
In this study, Pavord and colleagues found a potential benefit of mepolizumab treatment for reducing exacerbations in COPD patients with an eosinophilic phenotype. The conflicting results regarding the underlying physiology and the weak treatment effect suggest this medication may not be ready for use in clinical practice without additional supporting evidence. From a practical standpoint, the high cost of medication (~$2500 per month) and marginal benefit of treatment imply that treatment with mepolizumab in COPD patients may not be cost-effective, and even treatment in individual patients on a trial basis should be discouraged until additional supporting data becomes available. Of primary concern are the optimal selection of COPD patients that will achieve benefit with mepolizumab treatment, and the optimal dose of medication to achieve that benefit. The results presented here do not satisfactorily answer these questions, and additional studies are required.
—Arun Jose, MD, The George Washington University, Washington, DC
Study Overview
Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.
Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).
Setting and participants. Participants were recruited from over 15 countries in over 100 investigative sites. Inclusion criteria were adults (40 years or older) with a diagnosis of COPD for at least 1 year with: airflow limitation (FEV1/FVC < 0.7); some bronchodilator reversibility (post-bronchodilator FEV1 > 20% and ≤ 80% of predicted values); current COPD therapy for at least 3 months prior to enrollment (a high-dose inhaled corticosteroid, ICS, with at least 2 other classes of medications, to obtain “triple therapy”); and a high risk of exacerbations (at least 1 severe [requiring hospitalization] or 2 moderate [treatment with systemic corticosteroids and/or antibiotics] exacerbations in past year).
Notable exclusion criteria were patients with diagnoses of asthma in never-smokers, alpha-1 antitrypsin deficiency, recent exacerbations (in past month), lung volume reduction surgery (in past year), eosinophilic or parasitic diseases, or those with recent monoclonal antibody treatment. Patients with the asthma-COPD overlap syndrome were included only if they had a history of smoking and met the COPD inclusion criteria listed above.
Intervention. The treatment period lasted for a total of 52 weeks, with an additional 8 weeks of follow-up. Patients were randomized 1:1 to placebo or low-dose medication (100 mg) using permuted-block randomization in the METREX study regardless of eosinophil count (but they were stratified for a modified intention-to-treat analysis at screening into either low eosinophilic count [< 150 cells/uL] or high [≥ 150 cells/uL]). In the METREO study, patients were randomized 1:1:1 to placebo, low-dose (100 mg), or high-dose (300 mg) medication only if blood eosinophilia was present (≥ 150 cells/uL at screening or ≥ 300 cells/uL in past 12 months). Investigators and patients were blinded to presence of drug or placebo. Sample size calculations indicated that in order to provide a 90% power to detect a 30% decrease in the rate of exacerbations in METREX and 35% decrease in METREO, a total of 800 patients and 660 patients would need to be enrolled in METREX and METREO respectively. Both studies met their enrollment quota.
Main outcome measures. The primary outcome was the annual rate of exacerbations that were either moderate (requiring systemic corticosteroids and/or antibiotics) or severe (requiring hospitalization). Secondary outcomes included the time to first moderate/severe exacerbation, change from baseline in the COPD Assessment Test (CAT) and St. George’s Respiratory Questionnaire (SGRQ), and change from baseline in blood eosinophil count, FEV1, and FVC. Safety and adverse events endpoints were also assessed.
A modified intention-to-treat analysis was performed overall and in the METREX study stratified on eosinophilic count at screening; all patients who underwent randomization and received at least one dose of medication or placebo were included in that respective group. Multiple comparisons were accounted for using the Benjamini-Hochberg Test, exacerbations were assumed to follow a negative binomial distribution, and Cox proportional-hazards was used to model the relationship between covariates of interest and the primary outcome.
Main results. In the METREX study, 1161 patients were enrolled and 836 underwent randomization and received at least 1 dose of medication or placebo. In METREO, 1071 patients were enrolled and 674 underwent randomization and received at least one dose of medication or placebo. In both studies the patients in the medication and placebo groups were well balanced at baseline across demographics (age, gender, smoking history, duration of COPD) and pulmonary function (FEV1, FVC, FEV1/FVC, CAT, SGRQ). In METREX, a total of 462 (55%) patients had an eosinophilic phenotype and 374 (45%) did not.
There was no difference between groups in the primary endpoint of annual exacerbation rate in METREO (1.49/yr in placebo vs. 1.19/yr in low-dose and 1.27/yr in high-dose mepolizumab, rate ratio of high-dose to placebo 0.86, 95% confidence interval [CI] 0.7–1.05, P = 0.14). There was no difference in the primary outcome in the overall intention-to-treat analysis in the METREX study (1.49/yr in mepolizumab vs. 1.52/yr in placebo, P > 0.99). Only when analyzing the high eosinophilic phenotype in the stratified intention-to-treat METREX group was there a significant difference in the primary outcome (1.41/yr in mepolizumab vs. 1.71/yr in placebo, P = 0.04, rate ratio 0.82, 95% CI 0.68–0.98).
There were no significant differences in any secondary endpoint in the METREO study. In the METREX study, mepolizumab treatment resulted in a significantly longer time to first exacerbation (192 days vs. 141 days, hazard ratio 0.75, 95% CI 0.60–0.94, P = 0.04) but no difference in the change in SGRQ (–2.8 vs. –3.0, P > 0.99) or CAT score (–0.8 vs. 0, P > 0.99). There was no significant difference in any measures of pulmonary function between the treatment and placebo groups (FEV1, FVC, FEV1/FVC). As expected, there was a significant decrease in peripheral blood eosinophil count in both studies in the medication arm. The incidence of adverse events and safety endpoints were similar between the trial groups in METREX and METREO.
Conclusions. In this pair of placebo-controlled double-blind randomized parallel studies, there was a significant decline in annual exacerbation rate in patients with an eosinophilic phenotype treated with mepolizumab in a stratified intention-to-treat analysis of one of two parallel studies (METREX). However, there was no significant difference in the primary outcome of the other parallel study (METREO), which included only those patients with an eosinophilic phenotype. Additionally, there was no significant difference in any secondary endpoints in either study. The medication was generally safe and well tolerated.
Commentary
Mepolizumab is a humanized monoclonal antibody that targets and blocks interleukin-5, a key mediator of eosinophilic activity. Due to its ability to decrease eosinophil number and function, it is currently approved as a therapy for severe asthma with an eosinophilic phenotype [1]. While asthma and COPD have historically been thought of as separate entities with distinct pathophysiologic mechanisms, recent evidence has suggested that a subset of COPD patients experience significant eosinophilic inflammation. This group may behave more like asthmatic patients, and may have a different response to medications such as inhaled corticosteroids, but the role of eosinophils to guide prognostication and treatment in this group is still unclear [2,3].
In this study, Pavord and colleagues investigated the use of the anti-IL5 drug mepolizumab in COPD patients at risk of exacerbations who demonstrated an eosinophilic phenotype. The physiologic rationale for the study was that eosinophilic inflammation is thought to be a driver of exacerbations in COPD patients with an eosinophilic phenotype, and therefore a decrease in eosinophilic number and function should result in a decrease in exacerbations. The authors conducted a well-designed placebo-controlled double-blind study with a clearly defined endpoint, met their enrollment goals as determined by their power calculations, and used COPD patients at high risk of exacerbations to enrich their study.
There was no difference in the primary outcome in the METREO arm of the study, which included patients with baseline eosinophilia (> 150 cells/uL) or in the overall intention-to-treat analysis in METREX (which did not screen patients on baseline eosinophil count). Only when stratified on baseline eosinophil count in the METREX study was a significant treatment effect found, where patients with high eosinophil count at baseline (> 150 cells/uL) had a decreased risk of exacerbations when treated with mepolizumab. Notably there was no difference in any secondary outcome in METREO or in METREX aside from a longer time to first exacerbation in METREX in the mepolizumab group. The authors use this data to conclude that mepolizumab treatment results in a lower rate of exacerbations and a longer time to the first exacerbation in COPD patients with an eosinophilic phenotype, and the extent of the treatment effect is related to blood eosinophil counts.
The authors conducted a well-designed and rigorous study, and used robust and appropriate statistical analysis; however, significant questions remain regarding their conclusions. The primary concern is the role of mepolizumab in the treatment of COPD patients to decrease exacerbations may be overstated. When including only those with baseline eosinophilia in the METREO arm, there was no significant difference between placebo and low or high dose of mepolizumab; however, there was an appropriate and expected decrease in blood eosinophils, indicating the medication worked as intended. In the overall intention-to-treat analysis in the METREX arm, there was no difference between mepolizumab and placebo, and only in the analysis of METREX stratified to eosinophil count was there a significant difference (with an upper confidence interval rate ratio [0.98] approaching unity).
Additionally there was no significant difference between the 2 groups across a number of clinically important secondary endpoints, including pulmonary function measurements and symptomatic scores. Only the time to exacerbation was significantly longer in the mepolizumab group in METREX.
Taken together, this calls into question the conclusion that a decrease in eosinophil counts due to mepolizumab has resulted in a lower rate of exacerbations, particularly as a higher dose of mepolizumab did not result in a stronger effect. The lack of difference between groups in secondary endpoints is also concerning, as those would be expected to improve with a decrease in exacerbations [4,5]. As the authors point out, their evidence suggests that eosinophils may be an important biomarker in COPD and may aid in the therapeutic decision-making process. However, given the inconsistencies in the data as noted above, it would be difficult to rely on the evidence from this study alone to support their conclusion regarding the clinical utility of mepolizumab in COPD.
The authors discuss a number of limitations that may account for the lack of consistent effect seen in this study. Aside from the standard limitations applicable to any clinical trial, they note the potential confounding effect of previous oral glucocorticoid therapy in reducing eosinophil counts. This may have masked the eosinophilic phenotype in some study patients, leading to the attenuated effect of mepolizumab seen in this study.
The authors also note that information that might be potentially valuable for identifying treatment responders, such as a history of allergies and atopy, were not available. Inclusion of those patients may be helpful in enriching the trial with potential treatment-responders, and future studies may benefit from focusing on COPD patients with a more atopic phenotype who more closely resemble those with the asthma-COPD overlap syndrome.
A final limitation to discuss is the focus on blood eosinophilic counts. Due to the difficulty of measuring sputum eosinophils, and the reasonable degree of correlation between blood and sputum in asthmatic patients, blood eosinophils have largely supplanted sputum eosinophils as markers of TH2 CD4 T-cell activity in the pulmonary system [6]. This substitution is also used in the COPD population, however, due to the differences in pathophysiology it is unclear if eosinophils in asthmatic patients behave similarly to those in COPD patients [7]. Additionally, the cutoff of 150 cells/uL has been obtained primarily from sub-group analysis of previous studies on COPD patients, but it is unclear if this cutoff truly reflects elevated sputum eosinophilia. While there is likely some degree of correlation between blood and sputum eosinophilia in COPD patients, a lack of significant effect seen in this study may be due to an incorrect cutoff for elevated eosinophilia and a reliance on blood eosinophils over sputum counts. Further studies utilizing sputum eosinophils may be of value in addressing this limitation.
Applications for Clinical Practice
In this study, Pavord and colleagues found a potential benefit of mepolizumab treatment for reducing exacerbations in COPD patients with an eosinophilic phenotype. The conflicting results regarding the underlying physiology and the weak treatment effect suggest this medication may not be ready for use in clinical practice without additional supporting evidence. From a practical standpoint, the high cost of medication (~$2500 per month) and marginal benefit of treatment imply that treatment with mepolizumab in COPD patients may not be cost-effective, and even treatment in individual patients on a trial basis should be discouraged until additional supporting data becomes available. Of primary concern are the optimal selection of COPD patients that will achieve benefit with mepolizumab treatment, and the optimal dose of medication to achieve that benefit. The results presented here do not satisfactorily answer these questions, and additional studies are required.
—Arun Jose, MD, The George Washington University, Washington, DC
1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.
2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.
3. Roche N, Chapman KR, Vogelmeier CF, et al. Blood eosinophils and response to maintenance chronic obstructive pulmonary disease treatment. Data from the FLAME trial. Am J Respir Crit Care Med 2017;195:1189–97.
4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.
5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.
6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.
7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.
1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.
2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.
3. Roche N, Chapman KR, Vogelmeier CF, et al. Blood eosinophils and response to maintenance chronic obstructive pulmonary disease treatment. Data from the FLAME trial. Am J Respir Crit Care Med 2017;195:1189–97.
4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.
5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.
6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.
7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.
Teens with PID underscreened for HIV, syphilis
CHICAGO – Adolescents with pelvic inflammatory disease (PID) were unlikely to be screened for HIV or syphilis, and many didn’t receive an appropriate antibiotic regimen, according to a recent study reported at the annual meeting of the American Academy of Pediatrics.
Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.
The Centers for Disease Control and Prevention strongly recommends that all women diagnosed with PID be tested for HIV, and that high-risk individuals also be tested for syphilis, wrote Amanda Jichlinski, MD, and her coauthors at Children’s National Health System, Washington.
The study, presented during a poster session, used data from the national Pediatric Health Information System database from 2010 to 2015. A total of 10,698 records with a diagnostic code for PID were included; patients were females aged 12-21 years seen in a pediatric emergency department.
In addition to the primary outcome of syphilis and HIV testing, the authors also looked at whether antibiotic administration for PID was in line with CDC recommendations – and it wasn’t. “Fewer than half of patients in the ED received antibiotic regimens adherent to CDC guidelines,” wrote Dr. Jichlinski and her coauthors.
Forty-six percent of patients received ceftriaxone and doxycycline, 21% received ceftriaxone and azithromycin, and 6% received ceftriaxone and metronidazole. Ceftriaxone monotherapy was given to 15% of patients. One in 10 patients with a PID diagnosis received no antibiotic at all; 2% of patients received some other regimen.
The researchers used multivariable analysis to examine separately which patient and hospital characteristics were associated with an increased likelihood of testing for both HIV and syphilis. With white, non-Hispanic adolescents used as the referent, Hispanic females with PID were less likely to receive screening for either HIV or syphilis (adjusted odds ratio, 0.8 for both; 95% confidence interval, 0.7-1.0 for both).
In contrast, black non-Hispanic females were screened more often; the aOR for HIV screening was 1.4 (95% CI, 1.2-1.6), and the aOR for syphilis screening was 1.8 (95% CI, 1.6-2.0) for this group of adolescents.
Patients were dichotomized into older (17-21 years of age; n = 4,737, 44%) and younger (12-16 years of age; n = 5,961, 56%) age groups; younger patients were slightly more likely to receive HIV (aOR, 1.2) and syphilis (aOR, 1.1) screening.
Just under a third of patients in the study were seen in a hospital with fewer than 300 beds, and these facilities were more likely to screen for HIV (aOR, 1.4) and syphilis (aOR, 1.1) than the larger hospitals.
By far the largest predictor of whether HIV and syphilis screening was done, though, was a hospital admission. Patients who were admitted (n = 4,043, 38%) were 7 times more likely to be screened for HIV and 4.6 times more likely to be screened for syphilis than those who were sent home from the emergency department.
Although the large, nationally representative study had many strengths, Dr. Jichlinski and her coauthors acknowledged that the data they were provided couldn’t account for medication that was prescribed, rather than administered in the emergency department. Also, the results may not be generalizable to adolescents treated in nonpediatric emergency departments or other facilities, such as urgent care centers.
“Adolescents with PID are underscreened for HIV and syphilis,” wrote Dr. Jichlinski and her coauthors. They called for pediatricians to receive more education about management of PID in adolescents. From a practical perspective, the investigators also suggested incorporating order sets for sexually transmitted infection testing and antibiotic administration into electronic medical records; in this way, a PID diagnosis code would trigger simplified testing and treatment choices.
Dr. Jichlinski reported no conflicts of interest. Dr. Monika Goyal, MD, senior author on the study, reported funding support by the National Institute of Child Health and Human Development. Dr. Goyal also holds an appointment at the George Washington University, Washington.
SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.
CHICAGO – Adolescents with pelvic inflammatory disease (PID) were unlikely to be screened for HIV or syphilis, and many didn’t receive an appropriate antibiotic regimen, according to a recent study reported at the annual meeting of the American Academy of Pediatrics.
Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.
The Centers for Disease Control and Prevention strongly recommends that all women diagnosed with PID be tested for HIV, and that high-risk individuals also be tested for syphilis, wrote Amanda Jichlinski, MD, and her coauthors at Children’s National Health System, Washington.
The study, presented during a poster session, used data from the national Pediatric Health Information System database from 2010 to 2015. A total of 10,698 records with a diagnostic code for PID were included; patients were females aged 12-21 years seen in a pediatric emergency department.
In addition to the primary outcome of syphilis and HIV testing, the authors also looked at whether antibiotic administration for PID was in line with CDC recommendations – and it wasn’t. “Fewer than half of patients in the ED received antibiotic regimens adherent to CDC guidelines,” wrote Dr. Jichlinski and her coauthors.
Forty-six percent of patients received ceftriaxone and doxycycline, 21% received ceftriaxone and azithromycin, and 6% received ceftriaxone and metronidazole. Ceftriaxone monotherapy was given to 15% of patients. One in 10 patients with a PID diagnosis received no antibiotic at all; 2% of patients received some other regimen.
The researchers used multivariable analysis to examine separately which patient and hospital characteristics were associated with an increased likelihood of testing for both HIV and syphilis. With white, non-Hispanic adolescents used as the referent, Hispanic females with PID were less likely to receive screening for either HIV or syphilis (adjusted odds ratio, 0.8 for both; 95% confidence interval, 0.7-1.0 for both).
In contrast, black non-Hispanic females were screened more often; the aOR for HIV screening was 1.4 (95% CI, 1.2-1.6), and the aOR for syphilis screening was 1.8 (95% CI, 1.6-2.0) for this group of adolescents.
Patients were dichotomized into older (17-21 years of age; n = 4,737, 44%) and younger (12-16 years of age; n = 5,961, 56%) age groups; younger patients were slightly more likely to receive HIV (aOR, 1.2) and syphilis (aOR, 1.1) screening.
Just under a third of patients in the study were seen in a hospital with fewer than 300 beds, and these facilities were more likely to screen for HIV (aOR, 1.4) and syphilis (aOR, 1.1) than the larger hospitals.
By far the largest predictor of whether HIV and syphilis screening was done, though, was a hospital admission. Patients who were admitted (n = 4,043, 38%) were 7 times more likely to be screened for HIV and 4.6 times more likely to be screened for syphilis than those who were sent home from the emergency department.
Although the large, nationally representative study had many strengths, Dr. Jichlinski and her coauthors acknowledged that the data they were provided couldn’t account for medication that was prescribed, rather than administered in the emergency department. Also, the results may not be generalizable to adolescents treated in nonpediatric emergency departments or other facilities, such as urgent care centers.
“Adolescents with PID are underscreened for HIV and syphilis,” wrote Dr. Jichlinski and her coauthors. They called for pediatricians to receive more education about management of PID in adolescents. From a practical perspective, the investigators also suggested incorporating order sets for sexually transmitted infection testing and antibiotic administration into electronic medical records; in this way, a PID diagnosis code would trigger simplified testing and treatment choices.
Dr. Jichlinski reported no conflicts of interest. Dr. Monika Goyal, MD, senior author on the study, reported funding support by the National Institute of Child Health and Human Development. Dr. Goyal also holds an appointment at the George Washington University, Washington.
SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.
CHICAGO – Adolescents with pelvic inflammatory disease (PID) were unlikely to be screened for HIV or syphilis, and many didn’t receive an appropriate antibiotic regimen, according to a recent study reported at the annual meeting of the American Academy of Pediatrics.
Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.
The Centers for Disease Control and Prevention strongly recommends that all women diagnosed with PID be tested for HIV, and that high-risk individuals also be tested for syphilis, wrote Amanda Jichlinski, MD, and her coauthors at Children’s National Health System, Washington.
The study, presented during a poster session, used data from the national Pediatric Health Information System database from 2010 to 2015. A total of 10,698 records with a diagnostic code for PID were included; patients were females aged 12-21 years seen in a pediatric emergency department.
In addition to the primary outcome of syphilis and HIV testing, the authors also looked at whether antibiotic administration for PID was in line with CDC recommendations – and it wasn’t. “Fewer than half of patients in the ED received antibiotic regimens adherent to CDC guidelines,” wrote Dr. Jichlinski and her coauthors.
Forty-six percent of patients received ceftriaxone and doxycycline, 21% received ceftriaxone and azithromycin, and 6% received ceftriaxone and metronidazole. Ceftriaxone monotherapy was given to 15% of patients. One in 10 patients with a PID diagnosis received no antibiotic at all; 2% of patients received some other regimen.
The researchers used multivariable analysis to examine separately which patient and hospital characteristics were associated with an increased likelihood of testing for both HIV and syphilis. With white, non-Hispanic adolescents used as the referent, Hispanic females with PID were less likely to receive screening for either HIV or syphilis (adjusted odds ratio, 0.8 for both; 95% confidence interval, 0.7-1.0 for both).
In contrast, black non-Hispanic females were screened more often; the aOR for HIV screening was 1.4 (95% CI, 1.2-1.6), and the aOR for syphilis screening was 1.8 (95% CI, 1.6-2.0) for this group of adolescents.
Patients were dichotomized into older (17-21 years of age; n = 4,737, 44%) and younger (12-16 years of age; n = 5,961, 56%) age groups; younger patients were slightly more likely to receive HIV (aOR, 1.2) and syphilis (aOR, 1.1) screening.
Just under a third of patients in the study were seen in a hospital with fewer than 300 beds, and these facilities were more likely to screen for HIV (aOR, 1.4) and syphilis (aOR, 1.1) than the larger hospitals.
By far the largest predictor of whether HIV and syphilis screening was done, though, was a hospital admission. Patients who were admitted (n = 4,043, 38%) were 7 times more likely to be screened for HIV and 4.6 times more likely to be screened for syphilis than those who were sent home from the emergency department.
Although the large, nationally representative study had many strengths, Dr. Jichlinski and her coauthors acknowledged that the data they were provided couldn’t account for medication that was prescribed, rather than administered in the emergency department. Also, the results may not be generalizable to adolescents treated in nonpediatric emergency departments or other facilities, such as urgent care centers.
“Adolescents with PID are underscreened for HIV and syphilis,” wrote Dr. Jichlinski and her coauthors. They called for pediatricians to receive more education about management of PID in adolescents. From a practical perspective, the investigators also suggested incorporating order sets for sexually transmitted infection testing and antibiotic administration into electronic medical records; in this way, a PID diagnosis code would trigger simplified testing and treatment choices.
Dr. Jichlinski reported no conflicts of interest. Dr. Monika Goyal, MD, senior author on the study, reported funding support by the National Institute of Child Health and Human Development. Dr. Goyal also holds an appointment at the George Washington University, Washington.
SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.
REPORTING FROM AAP 2017
Key clinical point:
Major finding: Hispanic females were least likely to be screened (adjusted OR, 0.8), compared with non-Hispanic white females.
Study details: Retrospective study of 10,698 adolescent patients with PID from a national database.
Disclosures: The study was funded in part by the National Institute of Child Health and Development. The authors had no relevant financial disclosures.
Source: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine
Simple Patient Care Instructions Translate Best: Safety Guidelines for Physician Use of Google Translate
From the University of Arizona College of Medicine – Tucson, Tucson, AZ.
Abstract
- Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
- Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
- Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
- Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.
Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.
Acore measure of the meaningful use of electronic health records incentive program is the generation and provision of the after visit summary (AVS), a mechanism for physicians to provide patients with a written summary of the patient encounter [1,2]. Although not a required element for meaningful use, free text patient care instructions (PCIs) provide the physician an opportunity to improve patient engagement either at the time of service or through the patient portal [3] by providing a short written summary of the key points of the office visit based upon the visit’s clinical discussion. For patients who do not speak English, a verbal translation service is required [4], but seldom are specific patient instructions provided in writing in the patient’s preferred language. A mechanism to improve communication might be through translation of the PCI into the patient’s preferred language. Spanish is the most common language, other than English, spoken at home in the United States [5,6]. For this reason, we chose to investigate if it is feasible to use machine translation (Google Translate) to safely and reliably translate a variety of PCIs from English to Spanish, and to assess the types of translation errors and ambiguities that might result in unsafe communication. We further investigate if machine back translation might allow the author of patient care instructions to evaluate the quality of the Spanish machine translation.
There is evidence to suggest that patient communication and satisfaction will improve if portions of the AVS are communicated in Spanish to primarily Spanish-speaking patients. Pavlik et al conducted a randomized controlled trial on the association of patient recall, satisfaction, and adherence to the information communicated in an AVS, in a largely Hispanic (61%) primary care clinic setting [7]. The AVS was provided in English. They noted that Spanish speakers wished to receive information in Spanish, although most had access to translation by a family member. They also noted that a lack of ability to provide an AVS in Spanish was a concern among providers. There was no difference in recall or satisfaction between English and Spanish speakers with respect to medications and allergies, suggesting that not all portions of the AVS might need to be translated.
Machine translation refers to the automated process of translating one language to another. The most recent methods of machine translation, as exemplified by Google Translate (Google Inc., Mountain View, CA), do not use rules of grammar and dictionaries to perform translations but instead use artificial neural networks to learn from “millions of examples” of translation [8]. However, unsupervised machine translation can result in serious errors [9]. Patil gives as an example of a serious error of translation from English (“Your child is fitting”) to Swahili (“Your child is dead”). In British parlance, “fitting” is a term for “having a seizure” and represents an example of a term that is context sensitive. However, others note that there is reason to be optimistic about the state of machine translation for biomedical text [10].
One method of assessing translation quality is through back translation, where one translator takes the author’s work into the desired target language, and then a different translator takes the target language back to the language of the author. Like the children’s game Chinese Whispers (Telephone in the United States) [11], where a “secret message” is whispered from one child to the next and spoken aloud at the end of the line of children, back translation can test to see if a message “gets through.” In this analogy, when information is machine translated from English to Spanish, and then machine translated from Spanish to English (Figure), we can compare the initial message to the final translation to see if the message “gets through.” We further investigate if machine back translation might allow a non-Spanish speaking author of PCIs to evaluate the quality of the Spanish translation.
Our intention was to determine if machine back translation [12] could be used by an English-only author to assess the quality of an intermediate Spanish translation. If poorly worded Spanish translated back into poorly worded English, the author might choose to either refine their original message until an acceptable machine back translation was achieved or to not release the Spanish translation to the patient. We were also concerned that there might be instances where the intermediate Spanish was unacceptable, but when translated back into English by machine translation, relatively acceptable English might result. If this were the case, then back translation would fail to detect a relatively poor intermediate Spanish translation.
Methods
Patient Care Instructions
Original English PCIs
Example original English PCIs were solicited from the clinical faculty and resident staff of the University of Arizona College of Medicine by an email-based survey tool (Qualtrics, Inc, Provo UT). The solicitation stated the following:
We are conducting a study to assess how well Google Translate might perform in translating patient instructions from English to Spanish. Would you please take the time to type three sentences that might comprise a typical “nugget” of patient instruction using language that you would typically include in an After Visit Summary for a patient? An example might be: “Take two Tylenol 325 mg tablets every four hours while awake for the next two days. If you have a sudden increase in pain or fever, or begin vomiting, call our office. Drink plenty of fluids.”
A total of 100 PCIs were collected. The breadth of the clinical practice and writing styles of a College of Medicine faculty are represented: not all were completely clear or were well-formed sentences, but did represent examples provided by busy clinicians of typical language that they would provide in an AVS PCI.
Machine Translation into Spanish
The 100 original English (OE) PCIs were submitted to the Google Translate web interface (https://translate.google.com/) by cutting and pasting and selecting “Spanish,” resulting in machine Spanish. The translations were performed in January 2016. No specific version number is provided by Google on their web page, and the service is described to be constantly evolving (https://translate.google.com/about/intl/en_ALL/contribute.html).
Machine Back Translation into English (MBTE)
Google Translate was then used to translate the machine Spanish back into into English. MBTE represents the content that a monolingual English speaker might use to evaluate the machine Spanish.
Ratings of Translation Quality and Safety
Two panels of 6 raters evaluated machine Spanish and MBTE quality and safety. A bilingual English/Spanish speaking panel simultaneously evaluated the machine Spanish and MBTE compared to OE, with the goal of inferring where in the process an undesirable back translation error occurred. Bilingual raters were experienced bilingual clinicians or certified translators. A monolingual English speaking panel also evaluated the MBTE (compared to OE). They could only infer the quality and safety of the machine Spanish indirectly through inspection of MBTE, and their assessment was free of the potential bias of knowledge of the intermediate Spanish translation.
The raters used Likert scales to rate grammar similarity and content similarity (scale from 1 to 5: 1 = very dissimilar, 5 = identical). For each PCI, grammar and content scores for each rater were summed and then divided by 10 to yield a within-rater quality score ranging from 0 to 1. A panel-level (bilingual or monolingual) quality score was calculated by averaging the quality scores across raters.
Safety of translation was rated as 0 or Safe (“While the translation may be awkward, it is not dangerous” or 1 or Unsafe (“A dangerous translation error is present that might cause harm to the patient if instructions were followed”). If any panel member considered an item to be unsafe, the item as a whole was scored as unsafe.
Data Analysis
Descriptive Summary of PCI Contributions
The 100 PCIs were summarized in terms of volume (word count), complexity (Flesch-Kincaid Grade Level index [13]), and content (medication names, references, formatting) (Table 1). Word count and grade level were calculated using Microsoft Word (Microsoft Corp, Redmond WA).
Safety Analysis
Concordance analysis. A safety translation concern as defined in this study (“might cause harm”) is very subjective. To reduce some of the variation in assessment of safety, we identified 4 members of the bilingual panel whose safety assessments of MBTE were most similar to the most concordant 4 monolingual raters’ assessment of MBTE safety. The goal was to select the bilingual panel of 4 that was most “typical” of the behavior of a “typical” monolingual individual with respect to assessing the safety of an individual MBTE translation. We then used this bilingual panel to identify 2 sets of “unsafe” machine Spanish and MBTE PCI translations: PCIs where ANY of the 4 bilingual raters identified a safety concern in machine Spanish or MBTE, and PCIs where MOST (at least 3) of the 4 bilingual raters agree that PCI translation was “unsafe”.
An expansion of Cohen’s kappa was used to identify the most concordant pairing of 4 bilingual panel members and 4 monolingual panel members [14]. All pairwise comparisons of monolingual and bilingual panel members were coded as follows: +1 was scored when 2 raters were concordant (both scored safe or unsafe) and –1 was scored for discordant pairs. For the 225 possible pairings of 4 panel members (15 combinations of 4 of 6 bilingual, 15 combinations of 4 of 6 monolingual raters), the 100 PCI items scores ranged from +16 (absolute agreement of the 2 panels of 4) to –16 (absolute discordance). For each pairing, we summed the scores for the 100 PCIs to determine the most concordant 4 monolingual and 4 bilingual raters (highest summed scores), which were then used for all subsequent analyses of safety and quality.
Original English characteristics of unsafe translation.
A logistic regression was performed with safety as the dependent variable (safe/unsafe defined by bilingual raters) with explanatory variables of word count, grade level, and reference to medication in OE.
Quality Assessment
Bilingual and monolingual raters assessments of translation quality. We assessed the correlation between the bilingual quality ratings of machine Spanish vs. MBTE and conducted paired t tests comparing mean bilingual machine Spanish and MBTE ratings. High correlation and absence of a significant difference in means would support the notion that MBTE could be used to reliably assess machine Spanish quality.
We also assessed the correlation between bilingual quality assessments of MS vs. monolingual raters’ assessments of MBTE, and conducted paired comparison t tests comparing bilingual machine Spanish and monolingual MBTE quality ratings. These analyses assess the ability of an English-only reader of MBTE to predict the quality of machine Spanish, as determined by a bilingual rater. High correlation and absence of a significant difference in means would support the notion that MBTE could be used by an English-only speaker to reliably assess machine Spanish quality.
Associations between original English content and translation quality. Objective measures of original English were correlated via stepwise linear regression with bilingual assessment of machine Spanish quality.
Results
PCI Contributions
Example PCIs were contributed by 88 individuals and are summarized in Table 1. The 100 original English PCIs and the machine Spanish and MBTE translations obtained via Google Translate are available from the authors upon request.
Safety
Concordance Analysis
The 6 monolingual and bilingual raters agreed on the safety of 73 MBTE PCIs. The most concordant pairings of 4 agreed on 81 items. The least and most concordant pairings had concordance values of 0.68 and 0.84, respectively. Subsequent analyses include data from only the 4 most concordant monolingual and bilingual raters.
Bilingual and Monolingual Safety Ratings
Both bilingual and monolingual raters assessed MBTE. On average, bilingual ratings of MBTE of safety were higher (0.987) than monolingual ratings (0.925) (t = –3.897, P = 0.0002).
Identification of Unsafe Translations in Machine Spanish and MBTE
The bilingual panel identified 11 translations (either machine Spanish or MBTE) as unsafe: MS translation was unsafe for 9 items, MBTE unsafe for 5 items, with some items identified as unsafe in terms of both machine Spanish and MBTE. The original English, machine Spanish, and MBTE for these PCIs are listed in Table 2. One item (#93) revealed a machine Spanish drug dosing ambiguity that was not present in the MBTE, with safety concern expressed by 3 of 4 bilingual raters.
Original English c haracteristics of Unsafe Translation
A stepwise logistic regression was performed to evaluate whether characteristics of the original English text predicted the PCI being judged as having a safe or unsafe machine Spanish translation. The explanatory variables (listed in Table 1) evaluated were word count, reading grade level, inclusion of reference to a specific medication, inclusion of numbers (as in "take 2 tablets"), and inclusion of numbered statements (as in "1. Call if your cough worsens"). The stepwise selection procedure dropped number references and numbered sentences, although post hoc analysis showed that number references and medication references occurred so commonly together that they were essentially interchangeable. The final regression model included word count, reading grade level, and medication reference. The significant factors of reading grade level and medication reference had odds ratio (95% confidence interval) of 1.12 (1.01 to 1.41) and 4.91 (1.07 to 22.7) respectively (P = 0.042 each). As reading grade level includes word count per sentence and syllable count per word as linear predictors, the inclusion of word count in the model is likely to increase the discrimination of complex words of many syllables in predicting the occurrence of unsafe machine Spanish.
Quality
Bilingual and Monolingual Raters Assessments of Quality
The bilingual evaluators found similar mean quality for machine Spanish (mean 0.855, SD 0.0859) and MBTE (0.857, SD 0.0755) (P = 0.811). However, the correlation of R2=0.355 (P = 0.000) suggests that despite similarity in mean ratings, a good forward translation from original English to machine Spanish did not assure a good back translation from machine Spanish to MBTE. No difference in mean MBTE quality was identified between bilingual (0.857, SD 0.0754) and monolingual (0.852, SD 0.126) raters (P = 0.598), with correlation R2=0.565 (P = 0.000).
Discussion
In this article, we have collected a corpus of example PCIs across a large number of authors, and investigated how well Google Translate was able to translate the example instructions first to Spanish, and then back again to English. We learned that one can not always spot a problem in the intermediate Spanish by inspection of the back-translated English. We also learned that simple sentences were least likely to be associated with troublesome translations, and that specific instructions about medication usage should probably be approached with great care.
We learned that some authors readily use simple language (eg: “Have your blood work drawn in the lab in the next two weeks,” reading level 1.2) while others gravitate to very complex language (“If you develop headache, chest pain, abdominal pain or back pain, or if you have any spontaneous bleeding please go to the emergency department, advise them that you were recently treated for rattlesnake envenomation and have them call the poison center,” reading level 20.2).
The development in confidence in machine translation can be compared to development of self-driving cars. At early stages of development, the self-driving cars had drivers with a foot near the brake and hands near the steering wheel, ready to take over at any instant. Now, after much data has been collected, there is evidence that the machine may operate more predictably and safely than some human drivers [15,16]. Should the self-driving cars always have an operator behind the wheel, supervising the function of the software, and ready to take over at any instant, or is the purpose of the self-driving car to allow non-drivers to be transported in an automobile that they either cannot operate or choose not to operate at that time?
The benefit of using professional interpreters in communicating clinically significant data is unquestioned, especially when compared to ad-hoc interpreters who lack professional understanding of context [4]. Like a good human driver (as compared to a self-driving car that is operated by a program that is still learning), a qualified human translator will outperform machine translation in complex tasks. Similarly, for relatively simple translations that are meant to be generated by human speakers to be understood by individuals with a grammar school education and vocabulary, is the state of machine translation such that less human translation is now required?
Our use of 2 teams of evaluators allowed us to use the game of Telephone analogy to provide insight into how well the machine translation proceeded, first to Spanish, then back to English. Mostly (90 times in 100), an acceptable Spanish translation resulted in an acceptable English back translation. In 2 instances (Samples 7 and 32), the first translation into Spanish was unacceptable, and a subsequent translation back to English was also unacceptable, as might be expected. In 2 instances (Samples 60 and 92), the Spanish translation was acceptable, but the translation back to English was unacceptable. The rules of Telephone worked 94 times in 100.
Still, 6 times in 100, the unexpected occurred, where a relatively poor Spanish translation returned a relatively acceptable English back translation. The rules of Telephone were not followed. The Spanish in the middle was garbled, but became acceptable when translated back to English. A fluent Spanish speaker found the intermediate Spanish to be of concern, and the back translation did not identify the concern. This argues against widespread adoption of machine back translation for quality assessment, at least until better understanding of the limitations of machine back translation are better understood. Looking at examples where back translation “worked” is useful. In the 6 instances where the intermediate Spanish was judged to be unacceptable, but the English back translation acceptable, complex sentence structures were found, along with medication instructions.
Not tested was if the raters found the original English instructions to be unclear or unsafe as a starting point. Here is where we find the potential benefit of the present study, as it provides insight into the type of content that seems to translate well in this set of data. where the machine Spanish error was not present in MBTE. Overall, ratings of translation quality by bilingual and monolingual raters was high, suggesting that there may be some utility in the machine translation with safeguards other than, or in addition to, inspection of machine back translation of machine Spanish. We found there was an astonishing range in reading difficulty across the contributed samples. While the average estimated grade level for comprehension of the original English contributions was the 8th grade, the maximum was 22, indicating extreme complexity of both words used and sentence length.
In gathering the example PCIs, we did not give any additional instructions to the authors to limit complexity, we only asked for their “typical” language, and if the examples received are indeed typical, the instructions we provide are often quite complex. Wu [17] explored the readability of medical information intended for the public and found that on average, 18 years of education would be required to read and understand the clinical trial descriptions available at ClinicalTrials.gov. It seems apparent that the first step to improving the safety of machine translation is to simplify the task of the translator, by making the language that is used for translation as unambiguous and straightforward as possible. The article by Patil and Davies on the use of Google Translate in the clinic [9] generated a considerable number of rapid responses (similar to letters to the editor) [18]. The responses emphasized the need to keep the language used simple, the sentences short, and the communication direct.
A simple and straightforward suggestion to improve all patient care instructions (not just those anticipated to be translated) would be to display the Flesch-Kincaid reading level in real time as the content is generated. The computer resources required to perform reading level analysis are nearly identical to those required for real-time spell checking: a dictionary that breaks words into syllables. Showing authors the reading level in real time would provide a tool to improve all instructions, not just those intended for translation. Limiting the dictionary to specifically exclude potentially dangerous, complex, or confusing words as well as forbidden abbreviations would further identify troublesome language to the author, and would improve communication overall. Implementing such real-time feedback to authors of patient instructions is a logical next step in adding utility to the electronic health record.
It is important that culture and contextual understanding is taken into consideration while organizations use interpretation services. In the United States, federal law requires that language interpreters employed by health care organization receiving federal funds are not only bilingual but also bicultural [16]. We did not find examples of dangerous synonyms being misapplied in translation, but we cannot rule out the possibility that such errors can occur. This is beyond the scope of typical machine translation software.
Our data suggest that use of medication names and dosing frequencies should not be repeated in the PCI where confusion can arise from imprecise language translation. Translation ambiguities that generate safety concerns in PCI might be mitigated by moving such content into structured areas of the AVS.
Conclusion
This study suggests that 9 times out of 10, the quality of machine translation using Google Translate is acceptable in terms of quality and safety. Currently, machine back translation may fail to reveal a relatively poor translation from English to Spanish. This study showed that increasing sentence complexity, as measured by the reading level index, was associated with a significant (P < 0.05) increase in unsafe machine translation. Similarly, including medication instructions in machine translations were associated with increased risk (P < 0.05) of machine translation safety error in this study.
A simple way to improve communication now would be to display the reading level to authors of patient communication content in real time, and limit the dictionary of acceptable words to forbid the use of known ambiguous terms or forbidden abbreviations. This would teach authors to use simple language, and increase the chance that translation (either human or machine) would be effective. This preliminary study suggests that keeping medication dosing instructions in a structured format is advisable, as is keeping sentences simple. As with spoken language [4], starting with clear, simple to understand English instructions provides the best machine translations into Spanish.
The Clinical Machine Translation Study Group: Todd W. Altenbernd, Steven Bedrick, Mark D. Berg, Nerida Berrios, Mark A. Brown, Colleen K. Cagno, Charles B. Cairns, Elizabeth Calhoun, Raymond Carmody, Tara F. Carr, Clara Choo, Melissa L. Cox, Janiel Cragun, Rachel E.M. Cramton, Paola Davis, Archita Desai, Sarah M. Desoky, Sean Elliot, Mindi J. Fain, Albert Fiorello, Hillary Franke, Kimberly Gerhart, Victor Jose Gonzalez, Aaron John Goshinska, Lynn M. Gries, Erin M. Harvey, Karen Herbst, Elizabeth Juneman, Lauren Marie Imbornoni, Anita Koshy, Lisa Laughlin, Christina M. Laukaitis, Kwan Lee, Hong Lei, Joseph M. Miller, Prashanthinie Mohan, Wayne J. Morgan, Jarrod Mosier, Leigh A. Neumayer, Valentine Nfonsam, Vivienne Ng, Terence O'Keeffe, Merri Pendergrass, Jessie M. Pettit, John Leander Po, Claudia Marie Prospero Ponce, Sydney Rice, Marie Anoushka Ricker, Arielle E. Rubin, Robert J. Segal, Aurora A.G. Selpides, Whitney A. Smith, Jordana M. Smith, William Stevenson, Amy N. Sussman, Ole J. Thienhaus, Patrick Tsai, J. Daniel Twelker, Richard Wahl, Jillian Wang, Mingwu Wang, Samuel C. Werner, Mark D. Wheeler, Jason Wild, Sun Kun Yi, Karl Andrew Yousef, Le Yu.
Corresponding author: Joseph M. Miller, MD, MPH, Department of Ophthalmology and Vision Science, University of Arizona, 655 North Alvernon Way, Suite 108, Tucson AZ 85711, [email protected].
Financial disclosures: None.
1. Hummel J, Evans P. Providing clinical summaries to patients after each office visit: a technical guide. Qualis Health 2012. Accessed 14 Mar 2016 at http://hit.qualishealth.org/sites/default/files/hit.qualishealth.org/Providing-Clinical-Summaries-0712.pdf.
2. Neuberger M, Dontje K, Holzman G, et al. Examination of office visit patient preferences for the after-visit summary (AVS). Persp Health Infor Manage 2014;11:1d.
3. Kruse CS, Bolton K, Freriks G. The effect of patient portals on quality outcomes and its implications to meaningful use: a systematic review. J Med Internet Res 2015;17:e44.
4. Schoonover, K. Using a medical interpreter with persons of limited English proficiency. J Clin Outcomes Manage 2016;23:567–75.
5. Shin HB, Bruno R. Language use and English-speaking ability: 2000. Census 2000 Brief. Accessed 9 Nov 2017 at https://census.gov/content/dam/Census/library/publications/2013/acs/acs-22.pdf.
6. Lewis MP, Simons GF, Fennig CD, editors. Ethnologue: languages of the Americas and the Pacific. 19th ed. Dallas: Sil International; 2016.
7. Pavlik V, Brown AE, Nash S, et al. Association of patient recall, satisfaction, and adherence to content of an electronic health record (EHR)-generated after visit summary: a randomized clinical trial. J Am Board Fam Med 2014;27:209–18.
8. Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Accessed 9 Nov 2017 at https://arxiv.org/pdf/1611.04558.pdf.
9. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392.
10. Kaliyadan F, Gopinathan Pillai S. The use of Google language tools as an interpretation aid in cross-cultural doctor-patient interaction: a pilot study. Inform Prim Care 2010;18:141–3.
11. Zhang Y, Zhou S, Zhang Z, et al. Rumor evolution in social networks. Physical Review E 2013;87.
12. Shingenobu T. Evaluation and usability of back translation for intercultural communication. In: N. Aykin, editor. Usability and internationalization. Global and local user interfaces. UI-HCII 2007, Lecture Notes in Computer Science, vol 4560. Springer, Berlin, Heidelberg.
13. Kincaid JP, Fishburne Jr RP, Rogers RL, et al. Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. 1975. Accessed 7 May 2016 at http://www.dtic.mil/dtic/tr/fulltext/u2/a006655.pdf.
14. Kwiecien R, Kopp-Schneider A, Blettner M. Concordance analysis—part 16 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011;108:515–21.
15. Goodall N. Ethical decision making during automated vehicle crashes. Transportation Research Record: Journal of the Transportation Research Board 2014;2424:58–65.
16. Kalra N, Groves D. The enemy of good: estimating the cost of waiting for nearly perfect automated vehicles. Santa Monica, CA: RAND Corporation, 2017.
17. Wu DT, Hanauer DA., Mei Q, et al. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:269–75.
18. Responses to: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392 Accessed 10 Dec 2017 at www.bmj.com/content/349/bmj.g7392/rapid-responses.
19. Nápoles AM, Santoyo-Olsson J, Karliner LS, et al. Inaccurate language interpretation and its clinical significance in the medical encounters of Spanish-speaking Latinos. Med Care 2015;53:940–7.
From the University of Arizona College of Medicine – Tucson, Tucson, AZ.
Abstract
- Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
- Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
- Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
- Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.
Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.
Acore measure of the meaningful use of electronic health records incentive program is the generation and provision of the after visit summary (AVS), a mechanism for physicians to provide patients with a written summary of the patient encounter [1,2]. Although not a required element for meaningful use, free text patient care instructions (PCIs) provide the physician an opportunity to improve patient engagement either at the time of service or through the patient portal [3] by providing a short written summary of the key points of the office visit based upon the visit’s clinical discussion. For patients who do not speak English, a verbal translation service is required [4], but seldom are specific patient instructions provided in writing in the patient’s preferred language. A mechanism to improve communication might be through translation of the PCI into the patient’s preferred language. Spanish is the most common language, other than English, spoken at home in the United States [5,6]. For this reason, we chose to investigate if it is feasible to use machine translation (Google Translate) to safely and reliably translate a variety of PCIs from English to Spanish, and to assess the types of translation errors and ambiguities that might result in unsafe communication. We further investigate if machine back translation might allow the author of patient care instructions to evaluate the quality of the Spanish machine translation.
There is evidence to suggest that patient communication and satisfaction will improve if portions of the AVS are communicated in Spanish to primarily Spanish-speaking patients. Pavlik et al conducted a randomized controlled trial on the association of patient recall, satisfaction, and adherence to the information communicated in an AVS, in a largely Hispanic (61%) primary care clinic setting [7]. The AVS was provided in English. They noted that Spanish speakers wished to receive information in Spanish, although most had access to translation by a family member. They also noted that a lack of ability to provide an AVS in Spanish was a concern among providers. There was no difference in recall or satisfaction between English and Spanish speakers with respect to medications and allergies, suggesting that not all portions of the AVS might need to be translated.
Machine translation refers to the automated process of translating one language to another. The most recent methods of machine translation, as exemplified by Google Translate (Google Inc., Mountain View, CA), do not use rules of grammar and dictionaries to perform translations but instead use artificial neural networks to learn from “millions of examples” of translation [8]. However, unsupervised machine translation can result in serious errors [9]. Patil gives as an example of a serious error of translation from English (“Your child is fitting”) to Swahili (“Your child is dead”). In British parlance, “fitting” is a term for “having a seizure” and represents an example of a term that is context sensitive. However, others note that there is reason to be optimistic about the state of machine translation for biomedical text [10].
One method of assessing translation quality is through back translation, where one translator takes the author’s work into the desired target language, and then a different translator takes the target language back to the language of the author. Like the children’s game Chinese Whispers (Telephone in the United States) [11], where a “secret message” is whispered from one child to the next and spoken aloud at the end of the line of children, back translation can test to see if a message “gets through.” In this analogy, when information is machine translated from English to Spanish, and then machine translated from Spanish to English (Figure), we can compare the initial message to the final translation to see if the message “gets through.” We further investigate if machine back translation might allow a non-Spanish speaking author of PCIs to evaluate the quality of the Spanish translation.
Our intention was to determine if machine back translation [12] could be used by an English-only author to assess the quality of an intermediate Spanish translation. If poorly worded Spanish translated back into poorly worded English, the author might choose to either refine their original message until an acceptable machine back translation was achieved or to not release the Spanish translation to the patient. We were also concerned that there might be instances where the intermediate Spanish was unacceptable, but when translated back into English by machine translation, relatively acceptable English might result. If this were the case, then back translation would fail to detect a relatively poor intermediate Spanish translation.
Methods
Patient Care Instructions
Original English PCIs
Example original English PCIs were solicited from the clinical faculty and resident staff of the University of Arizona College of Medicine by an email-based survey tool (Qualtrics, Inc, Provo UT). The solicitation stated the following:
We are conducting a study to assess how well Google Translate might perform in translating patient instructions from English to Spanish. Would you please take the time to type three sentences that might comprise a typical “nugget” of patient instruction using language that you would typically include in an After Visit Summary for a patient? An example might be: “Take two Tylenol 325 mg tablets every four hours while awake for the next two days. If you have a sudden increase in pain or fever, or begin vomiting, call our office. Drink plenty of fluids.”
A total of 100 PCIs were collected. The breadth of the clinical practice and writing styles of a College of Medicine faculty are represented: not all were completely clear or were well-formed sentences, but did represent examples provided by busy clinicians of typical language that they would provide in an AVS PCI.
Machine Translation into Spanish
The 100 original English (OE) PCIs were submitted to the Google Translate web interface (https://translate.google.com/) by cutting and pasting and selecting “Spanish,” resulting in machine Spanish. The translations were performed in January 2016. No specific version number is provided by Google on their web page, and the service is described to be constantly evolving (https://translate.google.com/about/intl/en_ALL/contribute.html).
Machine Back Translation into English (MBTE)
Google Translate was then used to translate the machine Spanish back into into English. MBTE represents the content that a monolingual English speaker might use to evaluate the machine Spanish.
Ratings of Translation Quality and Safety
Two panels of 6 raters evaluated machine Spanish and MBTE quality and safety. A bilingual English/Spanish speaking panel simultaneously evaluated the machine Spanish and MBTE compared to OE, with the goal of inferring where in the process an undesirable back translation error occurred. Bilingual raters were experienced bilingual clinicians or certified translators. A monolingual English speaking panel also evaluated the MBTE (compared to OE). They could only infer the quality and safety of the machine Spanish indirectly through inspection of MBTE, and their assessment was free of the potential bias of knowledge of the intermediate Spanish translation.
The raters used Likert scales to rate grammar similarity and content similarity (scale from 1 to 5: 1 = very dissimilar, 5 = identical). For each PCI, grammar and content scores for each rater were summed and then divided by 10 to yield a within-rater quality score ranging from 0 to 1. A panel-level (bilingual or monolingual) quality score was calculated by averaging the quality scores across raters.
Safety of translation was rated as 0 or Safe (“While the translation may be awkward, it is not dangerous” or 1 or Unsafe (“A dangerous translation error is present that might cause harm to the patient if instructions were followed”). If any panel member considered an item to be unsafe, the item as a whole was scored as unsafe.
Data Analysis
Descriptive Summary of PCI Contributions
The 100 PCIs were summarized in terms of volume (word count), complexity (Flesch-Kincaid Grade Level index [13]), and content (medication names, references, formatting) (Table 1). Word count and grade level were calculated using Microsoft Word (Microsoft Corp, Redmond WA).
Safety Analysis
Concordance analysis. A safety translation concern as defined in this study (“might cause harm”) is very subjective. To reduce some of the variation in assessment of safety, we identified 4 members of the bilingual panel whose safety assessments of MBTE were most similar to the most concordant 4 monolingual raters’ assessment of MBTE safety. The goal was to select the bilingual panel of 4 that was most “typical” of the behavior of a “typical” monolingual individual with respect to assessing the safety of an individual MBTE translation. We then used this bilingual panel to identify 2 sets of “unsafe” machine Spanish and MBTE PCI translations: PCIs where ANY of the 4 bilingual raters identified a safety concern in machine Spanish or MBTE, and PCIs where MOST (at least 3) of the 4 bilingual raters agree that PCI translation was “unsafe”.
An expansion of Cohen’s kappa was used to identify the most concordant pairing of 4 bilingual panel members and 4 monolingual panel members [14]. All pairwise comparisons of monolingual and bilingual panel members were coded as follows: +1 was scored when 2 raters were concordant (both scored safe or unsafe) and –1 was scored for discordant pairs. For the 225 possible pairings of 4 panel members (15 combinations of 4 of 6 bilingual, 15 combinations of 4 of 6 monolingual raters), the 100 PCI items scores ranged from +16 (absolute agreement of the 2 panels of 4) to –16 (absolute discordance). For each pairing, we summed the scores for the 100 PCIs to determine the most concordant 4 monolingual and 4 bilingual raters (highest summed scores), which were then used for all subsequent analyses of safety and quality.
Original English characteristics of unsafe translation.
A logistic regression was performed with safety as the dependent variable (safe/unsafe defined by bilingual raters) with explanatory variables of word count, grade level, and reference to medication in OE.
Quality Assessment
Bilingual and monolingual raters assessments of translation quality. We assessed the correlation between the bilingual quality ratings of machine Spanish vs. MBTE and conducted paired t tests comparing mean bilingual machine Spanish and MBTE ratings. High correlation and absence of a significant difference in means would support the notion that MBTE could be used to reliably assess machine Spanish quality.
We also assessed the correlation between bilingual quality assessments of MS vs. monolingual raters’ assessments of MBTE, and conducted paired comparison t tests comparing bilingual machine Spanish and monolingual MBTE quality ratings. These analyses assess the ability of an English-only reader of MBTE to predict the quality of machine Spanish, as determined by a bilingual rater. High correlation and absence of a significant difference in means would support the notion that MBTE could be used by an English-only speaker to reliably assess machine Spanish quality.
Associations between original English content and translation quality. Objective measures of original English were correlated via stepwise linear regression with bilingual assessment of machine Spanish quality.
Results
PCI Contributions
Example PCIs were contributed by 88 individuals and are summarized in Table 1. The 100 original English PCIs and the machine Spanish and MBTE translations obtained via Google Translate are available from the authors upon request.
Safety
Concordance Analysis
The 6 monolingual and bilingual raters agreed on the safety of 73 MBTE PCIs. The most concordant pairings of 4 agreed on 81 items. The least and most concordant pairings had concordance values of 0.68 and 0.84, respectively. Subsequent analyses include data from only the 4 most concordant monolingual and bilingual raters.
Bilingual and Monolingual Safety Ratings
Both bilingual and monolingual raters assessed MBTE. On average, bilingual ratings of MBTE of safety were higher (0.987) than monolingual ratings (0.925) (t = –3.897, P = 0.0002).
Identification of Unsafe Translations in Machine Spanish and MBTE
The bilingual panel identified 11 translations (either machine Spanish or MBTE) as unsafe: MS translation was unsafe for 9 items, MBTE unsafe for 5 items, with some items identified as unsafe in terms of both machine Spanish and MBTE. The original English, machine Spanish, and MBTE for these PCIs are listed in Table 2. One item (#93) revealed a machine Spanish drug dosing ambiguity that was not present in the MBTE, with safety concern expressed by 3 of 4 bilingual raters.
Original English c haracteristics of Unsafe Translation
A stepwise logistic regression was performed to evaluate whether characteristics of the original English text predicted the PCI being judged as having a safe or unsafe machine Spanish translation. The explanatory variables (listed in Table 1) evaluated were word count, reading grade level, inclusion of reference to a specific medication, inclusion of numbers (as in "take 2 tablets"), and inclusion of numbered statements (as in "1. Call if your cough worsens"). The stepwise selection procedure dropped number references and numbered sentences, although post hoc analysis showed that number references and medication references occurred so commonly together that they were essentially interchangeable. The final regression model included word count, reading grade level, and medication reference. The significant factors of reading grade level and medication reference had odds ratio (95% confidence interval) of 1.12 (1.01 to 1.41) and 4.91 (1.07 to 22.7) respectively (P = 0.042 each). As reading grade level includes word count per sentence and syllable count per word as linear predictors, the inclusion of word count in the model is likely to increase the discrimination of complex words of many syllables in predicting the occurrence of unsafe machine Spanish.
Quality
Bilingual and Monolingual Raters Assessments of Quality
The bilingual evaluators found similar mean quality for machine Spanish (mean 0.855, SD 0.0859) and MBTE (0.857, SD 0.0755) (P = 0.811). However, the correlation of R2=0.355 (P = 0.000) suggests that despite similarity in mean ratings, a good forward translation from original English to machine Spanish did not assure a good back translation from machine Spanish to MBTE. No difference in mean MBTE quality was identified between bilingual (0.857, SD 0.0754) and monolingual (0.852, SD 0.126) raters (P = 0.598), with correlation R2=0.565 (P = 0.000).
Discussion
In this article, we have collected a corpus of example PCIs across a large number of authors, and investigated how well Google Translate was able to translate the example instructions first to Spanish, and then back again to English. We learned that one can not always spot a problem in the intermediate Spanish by inspection of the back-translated English. We also learned that simple sentences were least likely to be associated with troublesome translations, and that specific instructions about medication usage should probably be approached with great care.
We learned that some authors readily use simple language (eg: “Have your blood work drawn in the lab in the next two weeks,” reading level 1.2) while others gravitate to very complex language (“If you develop headache, chest pain, abdominal pain or back pain, or if you have any spontaneous bleeding please go to the emergency department, advise them that you were recently treated for rattlesnake envenomation and have them call the poison center,” reading level 20.2).
The development in confidence in machine translation can be compared to development of self-driving cars. At early stages of development, the self-driving cars had drivers with a foot near the brake and hands near the steering wheel, ready to take over at any instant. Now, after much data has been collected, there is evidence that the machine may operate more predictably and safely than some human drivers [15,16]. Should the self-driving cars always have an operator behind the wheel, supervising the function of the software, and ready to take over at any instant, or is the purpose of the self-driving car to allow non-drivers to be transported in an automobile that they either cannot operate or choose not to operate at that time?
The benefit of using professional interpreters in communicating clinically significant data is unquestioned, especially when compared to ad-hoc interpreters who lack professional understanding of context [4]. Like a good human driver (as compared to a self-driving car that is operated by a program that is still learning), a qualified human translator will outperform machine translation in complex tasks. Similarly, for relatively simple translations that are meant to be generated by human speakers to be understood by individuals with a grammar school education and vocabulary, is the state of machine translation such that less human translation is now required?
Our use of 2 teams of evaluators allowed us to use the game of Telephone analogy to provide insight into how well the machine translation proceeded, first to Spanish, then back to English. Mostly (90 times in 100), an acceptable Spanish translation resulted in an acceptable English back translation. In 2 instances (Samples 7 and 32), the first translation into Spanish was unacceptable, and a subsequent translation back to English was also unacceptable, as might be expected. In 2 instances (Samples 60 and 92), the Spanish translation was acceptable, but the translation back to English was unacceptable. The rules of Telephone worked 94 times in 100.
Still, 6 times in 100, the unexpected occurred, where a relatively poor Spanish translation returned a relatively acceptable English back translation. The rules of Telephone were not followed. The Spanish in the middle was garbled, but became acceptable when translated back to English. A fluent Spanish speaker found the intermediate Spanish to be of concern, and the back translation did not identify the concern. This argues against widespread adoption of machine back translation for quality assessment, at least until better understanding of the limitations of machine back translation are better understood. Looking at examples where back translation “worked” is useful. In the 6 instances where the intermediate Spanish was judged to be unacceptable, but the English back translation acceptable, complex sentence structures were found, along with medication instructions.
Not tested was if the raters found the original English instructions to be unclear or unsafe as a starting point. Here is where we find the potential benefit of the present study, as it provides insight into the type of content that seems to translate well in this set of data. where the machine Spanish error was not present in MBTE. Overall, ratings of translation quality by bilingual and monolingual raters was high, suggesting that there may be some utility in the machine translation with safeguards other than, or in addition to, inspection of machine back translation of machine Spanish. We found there was an astonishing range in reading difficulty across the contributed samples. While the average estimated grade level for comprehension of the original English contributions was the 8th grade, the maximum was 22, indicating extreme complexity of both words used and sentence length.
In gathering the example PCIs, we did not give any additional instructions to the authors to limit complexity, we only asked for their “typical” language, and if the examples received are indeed typical, the instructions we provide are often quite complex. Wu [17] explored the readability of medical information intended for the public and found that on average, 18 years of education would be required to read and understand the clinical trial descriptions available at ClinicalTrials.gov. It seems apparent that the first step to improving the safety of machine translation is to simplify the task of the translator, by making the language that is used for translation as unambiguous and straightforward as possible. The article by Patil and Davies on the use of Google Translate in the clinic [9] generated a considerable number of rapid responses (similar to letters to the editor) [18]. The responses emphasized the need to keep the language used simple, the sentences short, and the communication direct.
A simple and straightforward suggestion to improve all patient care instructions (not just those anticipated to be translated) would be to display the Flesch-Kincaid reading level in real time as the content is generated. The computer resources required to perform reading level analysis are nearly identical to those required for real-time spell checking: a dictionary that breaks words into syllables. Showing authors the reading level in real time would provide a tool to improve all instructions, not just those intended for translation. Limiting the dictionary to specifically exclude potentially dangerous, complex, or confusing words as well as forbidden abbreviations would further identify troublesome language to the author, and would improve communication overall. Implementing such real-time feedback to authors of patient instructions is a logical next step in adding utility to the electronic health record.
It is important that culture and contextual understanding is taken into consideration while organizations use interpretation services. In the United States, federal law requires that language interpreters employed by health care organization receiving federal funds are not only bilingual but also bicultural [16]. We did not find examples of dangerous synonyms being misapplied in translation, but we cannot rule out the possibility that such errors can occur. This is beyond the scope of typical machine translation software.
Our data suggest that use of medication names and dosing frequencies should not be repeated in the PCI where confusion can arise from imprecise language translation. Translation ambiguities that generate safety concerns in PCI might be mitigated by moving such content into structured areas of the AVS.
Conclusion
This study suggests that 9 times out of 10, the quality of machine translation using Google Translate is acceptable in terms of quality and safety. Currently, machine back translation may fail to reveal a relatively poor translation from English to Spanish. This study showed that increasing sentence complexity, as measured by the reading level index, was associated with a significant (P < 0.05) increase in unsafe machine translation. Similarly, including medication instructions in machine translations were associated with increased risk (P < 0.05) of machine translation safety error in this study.
A simple way to improve communication now would be to display the reading level to authors of patient communication content in real time, and limit the dictionary of acceptable words to forbid the use of known ambiguous terms or forbidden abbreviations. This would teach authors to use simple language, and increase the chance that translation (either human or machine) would be effective. This preliminary study suggests that keeping medication dosing instructions in a structured format is advisable, as is keeping sentences simple. As with spoken language [4], starting with clear, simple to understand English instructions provides the best machine translations into Spanish.
The Clinical Machine Translation Study Group: Todd W. Altenbernd, Steven Bedrick, Mark D. Berg, Nerida Berrios, Mark A. Brown, Colleen K. Cagno, Charles B. Cairns, Elizabeth Calhoun, Raymond Carmody, Tara F. Carr, Clara Choo, Melissa L. Cox, Janiel Cragun, Rachel E.M. Cramton, Paola Davis, Archita Desai, Sarah M. Desoky, Sean Elliot, Mindi J. Fain, Albert Fiorello, Hillary Franke, Kimberly Gerhart, Victor Jose Gonzalez, Aaron John Goshinska, Lynn M. Gries, Erin M. Harvey, Karen Herbst, Elizabeth Juneman, Lauren Marie Imbornoni, Anita Koshy, Lisa Laughlin, Christina M. Laukaitis, Kwan Lee, Hong Lei, Joseph M. Miller, Prashanthinie Mohan, Wayne J. Morgan, Jarrod Mosier, Leigh A. Neumayer, Valentine Nfonsam, Vivienne Ng, Terence O'Keeffe, Merri Pendergrass, Jessie M. Pettit, John Leander Po, Claudia Marie Prospero Ponce, Sydney Rice, Marie Anoushka Ricker, Arielle E. Rubin, Robert J. Segal, Aurora A.G. Selpides, Whitney A. Smith, Jordana M. Smith, William Stevenson, Amy N. Sussman, Ole J. Thienhaus, Patrick Tsai, J. Daniel Twelker, Richard Wahl, Jillian Wang, Mingwu Wang, Samuel C. Werner, Mark D. Wheeler, Jason Wild, Sun Kun Yi, Karl Andrew Yousef, Le Yu.
Corresponding author: Joseph M. Miller, MD, MPH, Department of Ophthalmology and Vision Science, University of Arizona, 655 North Alvernon Way, Suite 108, Tucson AZ 85711, [email protected].
Financial disclosures: None.
From the University of Arizona College of Medicine – Tucson, Tucson, AZ.
Abstract
- Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
- Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
- Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
- Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.
Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.
Acore measure of the meaningful use of electronic health records incentive program is the generation and provision of the after visit summary (AVS), a mechanism for physicians to provide patients with a written summary of the patient encounter [1,2]. Although not a required element for meaningful use, free text patient care instructions (PCIs) provide the physician an opportunity to improve patient engagement either at the time of service or through the patient portal [3] by providing a short written summary of the key points of the office visit based upon the visit’s clinical discussion. For patients who do not speak English, a verbal translation service is required [4], but seldom are specific patient instructions provided in writing in the patient’s preferred language. A mechanism to improve communication might be through translation of the PCI into the patient’s preferred language. Spanish is the most common language, other than English, spoken at home in the United States [5,6]. For this reason, we chose to investigate if it is feasible to use machine translation (Google Translate) to safely and reliably translate a variety of PCIs from English to Spanish, and to assess the types of translation errors and ambiguities that might result in unsafe communication. We further investigate if machine back translation might allow the author of patient care instructions to evaluate the quality of the Spanish machine translation.
There is evidence to suggest that patient communication and satisfaction will improve if portions of the AVS are communicated in Spanish to primarily Spanish-speaking patients. Pavlik et al conducted a randomized controlled trial on the association of patient recall, satisfaction, and adherence to the information communicated in an AVS, in a largely Hispanic (61%) primary care clinic setting [7]. The AVS was provided in English. They noted that Spanish speakers wished to receive information in Spanish, although most had access to translation by a family member. They also noted that a lack of ability to provide an AVS in Spanish was a concern among providers. There was no difference in recall or satisfaction between English and Spanish speakers with respect to medications and allergies, suggesting that not all portions of the AVS might need to be translated.
Machine translation refers to the automated process of translating one language to another. The most recent methods of machine translation, as exemplified by Google Translate (Google Inc., Mountain View, CA), do not use rules of grammar and dictionaries to perform translations but instead use artificial neural networks to learn from “millions of examples” of translation [8]. However, unsupervised machine translation can result in serious errors [9]. Patil gives as an example of a serious error of translation from English (“Your child is fitting”) to Swahili (“Your child is dead”). In British parlance, “fitting” is a term for “having a seizure” and represents an example of a term that is context sensitive. However, others note that there is reason to be optimistic about the state of machine translation for biomedical text [10].
One method of assessing translation quality is through back translation, where one translator takes the author’s work into the desired target language, and then a different translator takes the target language back to the language of the author. Like the children’s game Chinese Whispers (Telephone in the United States) [11], where a “secret message” is whispered from one child to the next and spoken aloud at the end of the line of children, back translation can test to see if a message “gets through.” In this analogy, when information is machine translated from English to Spanish, and then machine translated from Spanish to English (Figure), we can compare the initial message to the final translation to see if the message “gets through.” We further investigate if machine back translation might allow a non-Spanish speaking author of PCIs to evaluate the quality of the Spanish translation.
Our intention was to determine if machine back translation [12] could be used by an English-only author to assess the quality of an intermediate Spanish translation. If poorly worded Spanish translated back into poorly worded English, the author might choose to either refine their original message until an acceptable machine back translation was achieved or to not release the Spanish translation to the patient. We were also concerned that there might be instances where the intermediate Spanish was unacceptable, but when translated back into English by machine translation, relatively acceptable English might result. If this were the case, then back translation would fail to detect a relatively poor intermediate Spanish translation.
Methods
Patient Care Instructions
Original English PCIs
Example original English PCIs were solicited from the clinical faculty and resident staff of the University of Arizona College of Medicine by an email-based survey tool (Qualtrics, Inc, Provo UT). The solicitation stated the following:
We are conducting a study to assess how well Google Translate might perform in translating patient instructions from English to Spanish. Would you please take the time to type three sentences that might comprise a typical “nugget” of patient instruction using language that you would typically include in an After Visit Summary for a patient? An example might be: “Take two Tylenol 325 mg tablets every four hours while awake for the next two days. If you have a sudden increase in pain or fever, or begin vomiting, call our office. Drink plenty of fluids.”
A total of 100 PCIs were collected. The breadth of the clinical practice and writing styles of a College of Medicine faculty are represented: not all were completely clear or were well-formed sentences, but did represent examples provided by busy clinicians of typical language that they would provide in an AVS PCI.
Machine Translation into Spanish
The 100 original English (OE) PCIs were submitted to the Google Translate web interface (https://translate.google.com/) by cutting and pasting and selecting “Spanish,” resulting in machine Spanish. The translations were performed in January 2016. No specific version number is provided by Google on their web page, and the service is described to be constantly evolving (https://translate.google.com/about/intl/en_ALL/contribute.html).
Machine Back Translation into English (MBTE)
Google Translate was then used to translate the machine Spanish back into into English. MBTE represents the content that a monolingual English speaker might use to evaluate the machine Spanish.
Ratings of Translation Quality and Safety
Two panels of 6 raters evaluated machine Spanish and MBTE quality and safety. A bilingual English/Spanish speaking panel simultaneously evaluated the machine Spanish and MBTE compared to OE, with the goal of inferring where in the process an undesirable back translation error occurred. Bilingual raters were experienced bilingual clinicians or certified translators. A monolingual English speaking panel also evaluated the MBTE (compared to OE). They could only infer the quality and safety of the machine Spanish indirectly through inspection of MBTE, and their assessment was free of the potential bias of knowledge of the intermediate Spanish translation.
The raters used Likert scales to rate grammar similarity and content similarity (scale from 1 to 5: 1 = very dissimilar, 5 = identical). For each PCI, grammar and content scores for each rater were summed and then divided by 10 to yield a within-rater quality score ranging from 0 to 1. A panel-level (bilingual or monolingual) quality score was calculated by averaging the quality scores across raters.
Safety of translation was rated as 0 or Safe (“While the translation may be awkward, it is not dangerous” or 1 or Unsafe (“A dangerous translation error is present that might cause harm to the patient if instructions were followed”). If any panel member considered an item to be unsafe, the item as a whole was scored as unsafe.
Data Analysis
Descriptive Summary of PCI Contributions
The 100 PCIs were summarized in terms of volume (word count), complexity (Flesch-Kincaid Grade Level index [13]), and content (medication names, references, formatting) (Table 1). Word count and grade level were calculated using Microsoft Word (Microsoft Corp, Redmond WA).
Safety Analysis
Concordance analysis. A safety translation concern as defined in this study (“might cause harm”) is very subjective. To reduce some of the variation in assessment of safety, we identified 4 members of the bilingual panel whose safety assessments of MBTE were most similar to the most concordant 4 monolingual raters’ assessment of MBTE safety. The goal was to select the bilingual panel of 4 that was most “typical” of the behavior of a “typical” monolingual individual with respect to assessing the safety of an individual MBTE translation. We then used this bilingual panel to identify 2 sets of “unsafe” machine Spanish and MBTE PCI translations: PCIs where ANY of the 4 bilingual raters identified a safety concern in machine Spanish or MBTE, and PCIs where MOST (at least 3) of the 4 bilingual raters agree that PCI translation was “unsafe”.
An expansion of Cohen’s kappa was used to identify the most concordant pairing of 4 bilingual panel members and 4 monolingual panel members [14]. All pairwise comparisons of monolingual and bilingual panel members were coded as follows: +1 was scored when 2 raters were concordant (both scored safe or unsafe) and –1 was scored for discordant pairs. For the 225 possible pairings of 4 panel members (15 combinations of 4 of 6 bilingual, 15 combinations of 4 of 6 monolingual raters), the 100 PCI items scores ranged from +16 (absolute agreement of the 2 panels of 4) to –16 (absolute discordance). For each pairing, we summed the scores for the 100 PCIs to determine the most concordant 4 monolingual and 4 bilingual raters (highest summed scores), which were then used for all subsequent analyses of safety and quality.
Original English characteristics of unsafe translation.
A logistic regression was performed with safety as the dependent variable (safe/unsafe defined by bilingual raters) with explanatory variables of word count, grade level, and reference to medication in OE.
Quality Assessment
Bilingual and monolingual raters assessments of translation quality. We assessed the correlation between the bilingual quality ratings of machine Spanish vs. MBTE and conducted paired t tests comparing mean bilingual machine Spanish and MBTE ratings. High correlation and absence of a significant difference in means would support the notion that MBTE could be used to reliably assess machine Spanish quality.
We also assessed the correlation between bilingual quality assessments of MS vs. monolingual raters’ assessments of MBTE, and conducted paired comparison t tests comparing bilingual machine Spanish and monolingual MBTE quality ratings. These analyses assess the ability of an English-only reader of MBTE to predict the quality of machine Spanish, as determined by a bilingual rater. High correlation and absence of a significant difference in means would support the notion that MBTE could be used by an English-only speaker to reliably assess machine Spanish quality.
Associations between original English content and translation quality. Objective measures of original English were correlated via stepwise linear regression with bilingual assessment of machine Spanish quality.
Results
PCI Contributions
Example PCIs were contributed by 88 individuals and are summarized in Table 1. The 100 original English PCIs and the machine Spanish and MBTE translations obtained via Google Translate are available from the authors upon request.
Safety
Concordance Analysis
The 6 monolingual and bilingual raters agreed on the safety of 73 MBTE PCIs. The most concordant pairings of 4 agreed on 81 items. The least and most concordant pairings had concordance values of 0.68 and 0.84, respectively. Subsequent analyses include data from only the 4 most concordant monolingual and bilingual raters.
Bilingual and Monolingual Safety Ratings
Both bilingual and monolingual raters assessed MBTE. On average, bilingual ratings of MBTE of safety were higher (0.987) than monolingual ratings (0.925) (t = –3.897, P = 0.0002).
Identification of Unsafe Translations in Machine Spanish and MBTE
The bilingual panel identified 11 translations (either machine Spanish or MBTE) as unsafe: MS translation was unsafe for 9 items, MBTE unsafe for 5 items, with some items identified as unsafe in terms of both machine Spanish and MBTE. The original English, machine Spanish, and MBTE for these PCIs are listed in Table 2. One item (#93) revealed a machine Spanish drug dosing ambiguity that was not present in the MBTE, with safety concern expressed by 3 of 4 bilingual raters.
Original English c haracteristics of Unsafe Translation
A stepwise logistic regression was performed to evaluate whether characteristics of the original English text predicted the PCI being judged as having a safe or unsafe machine Spanish translation. The explanatory variables (listed in Table 1) evaluated were word count, reading grade level, inclusion of reference to a specific medication, inclusion of numbers (as in "take 2 tablets"), and inclusion of numbered statements (as in "1. Call if your cough worsens"). The stepwise selection procedure dropped number references and numbered sentences, although post hoc analysis showed that number references and medication references occurred so commonly together that they were essentially interchangeable. The final regression model included word count, reading grade level, and medication reference. The significant factors of reading grade level and medication reference had odds ratio (95% confidence interval) of 1.12 (1.01 to 1.41) and 4.91 (1.07 to 22.7) respectively (P = 0.042 each). As reading grade level includes word count per sentence and syllable count per word as linear predictors, the inclusion of word count in the model is likely to increase the discrimination of complex words of many syllables in predicting the occurrence of unsafe machine Spanish.
Quality
Bilingual and Monolingual Raters Assessments of Quality
The bilingual evaluators found similar mean quality for machine Spanish (mean 0.855, SD 0.0859) and MBTE (0.857, SD 0.0755) (P = 0.811). However, the correlation of R2=0.355 (P = 0.000) suggests that despite similarity in mean ratings, a good forward translation from original English to machine Spanish did not assure a good back translation from machine Spanish to MBTE. No difference in mean MBTE quality was identified between bilingual (0.857, SD 0.0754) and monolingual (0.852, SD 0.126) raters (P = 0.598), with correlation R2=0.565 (P = 0.000).
Discussion
In this article, we have collected a corpus of example PCIs across a large number of authors, and investigated how well Google Translate was able to translate the example instructions first to Spanish, and then back again to English. We learned that one can not always spot a problem in the intermediate Spanish by inspection of the back-translated English. We also learned that simple sentences were least likely to be associated with troublesome translations, and that specific instructions about medication usage should probably be approached with great care.
We learned that some authors readily use simple language (eg: “Have your blood work drawn in the lab in the next two weeks,” reading level 1.2) while others gravitate to very complex language (“If you develop headache, chest pain, abdominal pain or back pain, or if you have any spontaneous bleeding please go to the emergency department, advise them that you were recently treated for rattlesnake envenomation and have them call the poison center,” reading level 20.2).
The development in confidence in machine translation can be compared to development of self-driving cars. At early stages of development, the self-driving cars had drivers with a foot near the brake and hands near the steering wheel, ready to take over at any instant. Now, after much data has been collected, there is evidence that the machine may operate more predictably and safely than some human drivers [15,16]. Should the self-driving cars always have an operator behind the wheel, supervising the function of the software, and ready to take over at any instant, or is the purpose of the self-driving car to allow non-drivers to be transported in an automobile that they either cannot operate or choose not to operate at that time?
The benefit of using professional interpreters in communicating clinically significant data is unquestioned, especially when compared to ad-hoc interpreters who lack professional understanding of context [4]. Like a good human driver (as compared to a self-driving car that is operated by a program that is still learning), a qualified human translator will outperform machine translation in complex tasks. Similarly, for relatively simple translations that are meant to be generated by human speakers to be understood by individuals with a grammar school education and vocabulary, is the state of machine translation such that less human translation is now required?
Our use of 2 teams of evaluators allowed us to use the game of Telephone analogy to provide insight into how well the machine translation proceeded, first to Spanish, then back to English. Mostly (90 times in 100), an acceptable Spanish translation resulted in an acceptable English back translation. In 2 instances (Samples 7 and 32), the first translation into Spanish was unacceptable, and a subsequent translation back to English was also unacceptable, as might be expected. In 2 instances (Samples 60 and 92), the Spanish translation was acceptable, but the translation back to English was unacceptable. The rules of Telephone worked 94 times in 100.
Still, 6 times in 100, the unexpected occurred, where a relatively poor Spanish translation returned a relatively acceptable English back translation. The rules of Telephone were not followed. The Spanish in the middle was garbled, but became acceptable when translated back to English. A fluent Spanish speaker found the intermediate Spanish to be of concern, and the back translation did not identify the concern. This argues against widespread adoption of machine back translation for quality assessment, at least until better understanding of the limitations of machine back translation are better understood. Looking at examples where back translation “worked” is useful. In the 6 instances where the intermediate Spanish was judged to be unacceptable, but the English back translation acceptable, complex sentence structures were found, along with medication instructions.
Not tested was if the raters found the original English instructions to be unclear or unsafe as a starting point. Here is where we find the potential benefit of the present study, as it provides insight into the type of content that seems to translate well in this set of data. where the machine Spanish error was not present in MBTE. Overall, ratings of translation quality by bilingual and monolingual raters was high, suggesting that there may be some utility in the machine translation with safeguards other than, or in addition to, inspection of machine back translation of machine Spanish. We found there was an astonishing range in reading difficulty across the contributed samples. While the average estimated grade level for comprehension of the original English contributions was the 8th grade, the maximum was 22, indicating extreme complexity of both words used and sentence length.
In gathering the example PCIs, we did not give any additional instructions to the authors to limit complexity, we only asked for their “typical” language, and if the examples received are indeed typical, the instructions we provide are often quite complex. Wu [17] explored the readability of medical information intended for the public and found that on average, 18 years of education would be required to read and understand the clinical trial descriptions available at ClinicalTrials.gov. It seems apparent that the first step to improving the safety of machine translation is to simplify the task of the translator, by making the language that is used for translation as unambiguous and straightforward as possible. The article by Patil and Davies on the use of Google Translate in the clinic [9] generated a considerable number of rapid responses (similar to letters to the editor) [18]. The responses emphasized the need to keep the language used simple, the sentences short, and the communication direct.
A simple and straightforward suggestion to improve all patient care instructions (not just those anticipated to be translated) would be to display the Flesch-Kincaid reading level in real time as the content is generated. The computer resources required to perform reading level analysis are nearly identical to those required for real-time spell checking: a dictionary that breaks words into syllables. Showing authors the reading level in real time would provide a tool to improve all instructions, not just those intended for translation. Limiting the dictionary to specifically exclude potentially dangerous, complex, or confusing words as well as forbidden abbreviations would further identify troublesome language to the author, and would improve communication overall. Implementing such real-time feedback to authors of patient instructions is a logical next step in adding utility to the electronic health record.
It is important that culture and contextual understanding is taken into consideration while organizations use interpretation services. In the United States, federal law requires that language interpreters employed by health care organization receiving federal funds are not only bilingual but also bicultural [16]. We did not find examples of dangerous synonyms being misapplied in translation, but we cannot rule out the possibility that such errors can occur. This is beyond the scope of typical machine translation software.
Our data suggest that use of medication names and dosing frequencies should not be repeated in the PCI where confusion can arise from imprecise language translation. Translation ambiguities that generate safety concerns in PCI might be mitigated by moving such content into structured areas of the AVS.
Conclusion
This study suggests that 9 times out of 10, the quality of machine translation using Google Translate is acceptable in terms of quality and safety. Currently, machine back translation may fail to reveal a relatively poor translation from English to Spanish. This study showed that increasing sentence complexity, as measured by the reading level index, was associated with a significant (P < 0.05) increase in unsafe machine translation. Similarly, including medication instructions in machine translations were associated with increased risk (P < 0.05) of machine translation safety error in this study.
A simple way to improve communication now would be to display the reading level to authors of patient communication content in real time, and limit the dictionary of acceptable words to forbid the use of known ambiguous terms or forbidden abbreviations. This would teach authors to use simple language, and increase the chance that translation (either human or machine) would be effective. This preliminary study suggests that keeping medication dosing instructions in a structured format is advisable, as is keeping sentences simple. As with spoken language [4], starting with clear, simple to understand English instructions provides the best machine translations into Spanish.
The Clinical Machine Translation Study Group: Todd W. Altenbernd, Steven Bedrick, Mark D. Berg, Nerida Berrios, Mark A. Brown, Colleen K. Cagno, Charles B. Cairns, Elizabeth Calhoun, Raymond Carmody, Tara F. Carr, Clara Choo, Melissa L. Cox, Janiel Cragun, Rachel E.M. Cramton, Paola Davis, Archita Desai, Sarah M. Desoky, Sean Elliot, Mindi J. Fain, Albert Fiorello, Hillary Franke, Kimberly Gerhart, Victor Jose Gonzalez, Aaron John Goshinska, Lynn M. Gries, Erin M. Harvey, Karen Herbst, Elizabeth Juneman, Lauren Marie Imbornoni, Anita Koshy, Lisa Laughlin, Christina M. Laukaitis, Kwan Lee, Hong Lei, Joseph M. Miller, Prashanthinie Mohan, Wayne J. Morgan, Jarrod Mosier, Leigh A. Neumayer, Valentine Nfonsam, Vivienne Ng, Terence O'Keeffe, Merri Pendergrass, Jessie M. Pettit, John Leander Po, Claudia Marie Prospero Ponce, Sydney Rice, Marie Anoushka Ricker, Arielle E. Rubin, Robert J. Segal, Aurora A.G. Selpides, Whitney A. Smith, Jordana M. Smith, William Stevenson, Amy N. Sussman, Ole J. Thienhaus, Patrick Tsai, J. Daniel Twelker, Richard Wahl, Jillian Wang, Mingwu Wang, Samuel C. Werner, Mark D. Wheeler, Jason Wild, Sun Kun Yi, Karl Andrew Yousef, Le Yu.
Corresponding author: Joseph M. Miller, MD, MPH, Department of Ophthalmology and Vision Science, University of Arizona, 655 North Alvernon Way, Suite 108, Tucson AZ 85711, [email protected].
Financial disclosures: None.
1. Hummel J, Evans P. Providing clinical summaries to patients after each office visit: a technical guide. Qualis Health 2012. Accessed 14 Mar 2016 at http://hit.qualishealth.org/sites/default/files/hit.qualishealth.org/Providing-Clinical-Summaries-0712.pdf.
2. Neuberger M, Dontje K, Holzman G, et al. Examination of office visit patient preferences for the after-visit summary (AVS). Persp Health Infor Manage 2014;11:1d.
3. Kruse CS, Bolton K, Freriks G. The effect of patient portals on quality outcomes and its implications to meaningful use: a systematic review. J Med Internet Res 2015;17:e44.
4. Schoonover, K. Using a medical interpreter with persons of limited English proficiency. J Clin Outcomes Manage 2016;23:567–75.
5. Shin HB, Bruno R. Language use and English-speaking ability: 2000. Census 2000 Brief. Accessed 9 Nov 2017 at https://census.gov/content/dam/Census/library/publications/2013/acs/acs-22.pdf.
6. Lewis MP, Simons GF, Fennig CD, editors. Ethnologue: languages of the Americas and the Pacific. 19th ed. Dallas: Sil International; 2016.
7. Pavlik V, Brown AE, Nash S, et al. Association of patient recall, satisfaction, and adherence to content of an electronic health record (EHR)-generated after visit summary: a randomized clinical trial. J Am Board Fam Med 2014;27:209–18.
8. Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Accessed 9 Nov 2017 at https://arxiv.org/pdf/1611.04558.pdf.
9. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392.
10. Kaliyadan F, Gopinathan Pillai S. The use of Google language tools as an interpretation aid in cross-cultural doctor-patient interaction: a pilot study. Inform Prim Care 2010;18:141–3.
11. Zhang Y, Zhou S, Zhang Z, et al. Rumor evolution in social networks. Physical Review E 2013;87.
12. Shingenobu T. Evaluation and usability of back translation for intercultural communication. In: N. Aykin, editor. Usability and internationalization. Global and local user interfaces. UI-HCII 2007, Lecture Notes in Computer Science, vol 4560. Springer, Berlin, Heidelberg.
13. Kincaid JP, Fishburne Jr RP, Rogers RL, et al. Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. 1975. Accessed 7 May 2016 at http://www.dtic.mil/dtic/tr/fulltext/u2/a006655.pdf.
14. Kwiecien R, Kopp-Schneider A, Blettner M. Concordance analysis—part 16 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011;108:515–21.
15. Goodall N. Ethical decision making during automated vehicle crashes. Transportation Research Record: Journal of the Transportation Research Board 2014;2424:58–65.
16. Kalra N, Groves D. The enemy of good: estimating the cost of waiting for nearly perfect automated vehicles. Santa Monica, CA: RAND Corporation, 2017.
17. Wu DT, Hanauer DA., Mei Q, et al. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:269–75.
18. Responses to: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392 Accessed 10 Dec 2017 at www.bmj.com/content/349/bmj.g7392/rapid-responses.
19. Nápoles AM, Santoyo-Olsson J, Karliner LS, et al. Inaccurate language interpretation and its clinical significance in the medical encounters of Spanish-speaking Latinos. Med Care 2015;53:940–7.
1. Hummel J, Evans P. Providing clinical summaries to patients after each office visit: a technical guide. Qualis Health 2012. Accessed 14 Mar 2016 at http://hit.qualishealth.org/sites/default/files/hit.qualishealth.org/Providing-Clinical-Summaries-0712.pdf.
2. Neuberger M, Dontje K, Holzman G, et al. Examination of office visit patient preferences for the after-visit summary (AVS). Persp Health Infor Manage 2014;11:1d.
3. Kruse CS, Bolton K, Freriks G. The effect of patient portals on quality outcomes and its implications to meaningful use: a systematic review. J Med Internet Res 2015;17:e44.
4. Schoonover, K. Using a medical interpreter with persons of limited English proficiency. J Clin Outcomes Manage 2016;23:567–75.
5. Shin HB, Bruno R. Language use and English-speaking ability: 2000. Census 2000 Brief. Accessed 9 Nov 2017 at https://census.gov/content/dam/Census/library/publications/2013/acs/acs-22.pdf.
6. Lewis MP, Simons GF, Fennig CD, editors. Ethnologue: languages of the Americas and the Pacific. 19th ed. Dallas: Sil International; 2016.
7. Pavlik V, Brown AE, Nash S, et al. Association of patient recall, satisfaction, and adherence to content of an electronic health record (EHR)-generated after visit summary: a randomized clinical trial. J Am Board Fam Med 2014;27:209–18.
8. Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Accessed 9 Nov 2017 at https://arxiv.org/pdf/1611.04558.pdf.
9. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392.
10. Kaliyadan F, Gopinathan Pillai S. The use of Google language tools as an interpretation aid in cross-cultural doctor-patient interaction: a pilot study. Inform Prim Care 2010;18:141–3.
11. Zhang Y, Zhou S, Zhang Z, et al. Rumor evolution in social networks. Physical Review E 2013;87.
12. Shingenobu T. Evaluation and usability of back translation for intercultural communication. In: N. Aykin, editor. Usability and internationalization. Global and local user interfaces. UI-HCII 2007, Lecture Notes in Computer Science, vol 4560. Springer, Berlin, Heidelberg.
13. Kincaid JP, Fishburne Jr RP, Rogers RL, et al. Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. 1975. Accessed 7 May 2016 at http://www.dtic.mil/dtic/tr/fulltext/u2/a006655.pdf.
14. Kwiecien R, Kopp-Schneider A, Blettner M. Concordance analysis—part 16 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011;108:515–21.
15. Goodall N. Ethical decision making during automated vehicle crashes. Transportation Research Record: Journal of the Transportation Research Board 2014;2424:58–65.
16. Kalra N, Groves D. The enemy of good: estimating the cost of waiting for nearly perfect automated vehicles. Santa Monica, CA: RAND Corporation, 2017.
17. Wu DT, Hanauer DA., Mei Q, et al. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:269–75.
18. Responses to: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392 Accessed 10 Dec 2017 at www.bmj.com/content/349/bmj.g7392/rapid-responses.
19. Nápoles AM, Santoyo-Olsson J, Karliner LS, et al. Inaccurate language interpretation and its clinical significance in the medical encounters of Spanish-speaking Latinos. Med Care 2015;53:940–7.
Nondrug Treatments May Benefit Patients With Epilepsy
WASHINGTON, DC—Many patients with pharmacoresistant epilepsy may benefit from nondrug treatments, including vagus nerve stimulation (VNS), the ketogenic diet, and corpus callosotomy, according to a study presented at the 71st Annual Meeting of the American Epilepsy Society. The treatments may reduce generalized and focal seizures, and most parents whose children underwent these procedures would opt for the same treatment under similar circumstances, the researchers said.
About 20% to 30% of patients have pharmacoresistant epilepsy. The ketogenic diet, corpus callosotomy, and VNS have been studied as alternatives to antiepileptic drugs (AEDs) for these patients, but few studies have compared the modalities.
Dave F. Clarke, MD, MBBS, Professor of Pediatric Neurology at the Baylor College of Medicine and Clinical Director of Epilepsy at Texas Children’s Hospital in Houston, and colleagues compared seizure control, cognitive and behavioral factors, quality of life, and parent satisfaction among patients who received VNS, underwent corpus callosotomy, or initiated the ketogenic diet. They identified 336 patients who had received one of these treatments at Dell Children’s Medical Center of Central Texas in Austin between January 2010 and November 2015. Parents of 210 of the patients completed a nine-item telephone survey.
Of the 210 patients whose parents completed the survey, 98 (33.6%) had initiated the ketogenic diet, 150 (51.4%) had received VNS, and 44 (15.1%) had undergone corpus callosotomy. Patients were between the ages of 8 months and 20 years. Patients who had initiated the ketogenic diet had a mean age of about 7, and patients who received VNS or underwent corpus callosotomy had a mean age of about 10. Patients had failed more than three AEDs on average (range, two to 13).
Parents reported a 50% or greater reduction in generalized seizures in 63% of patients who went on the ketogenic diet, 54% of patients who underwent corpus callosotomy, and 52% of patients who received VNS. Parents reported a 50% or greater reduction in focal seizures in 56% of children who went on the ketogenic diet, 56% of patients who had corpus callosotomy, and 53% of patients who received VNS.
In addition, parents reported improved quality of life in 48% of patients on the ketogenic diet, 63% of patients who had corpus callosotomy, and 44% of patients who received VNS. Overall, 80% of parents whose children were on the ketogenic diet or received VNS and 75% of parents whose children underwent corpus callosotomy reported that they were satisfied with the treatment that their child had received.
“Higher health-related quality of life after intervention was predicted by improved behavior, increased engagement, diminished frequency of atonic or generalized tonic-clonic seizures, and reduction in epilepsy-related injuries,” the researchers concluded. Parents were more likely to say that they would repeat the procedure if, after the treatment, “their child was more engaged, had diminished frequency of atonic or generalized tonic-clonic seizures, and had a reduction in epilepsy-related injuries.”
“Unfortunately, many doctors keep trying medications without considering alternatives,” said Dr. Clarke. “Based on the parents’ feedback, I would suggest doctors introduce the concept of alternatives after two AEDs fail to control seizures.” If surgery to ablate or remove the area of the brain where seizures originate is not an option, neurologists should talk to parents about the ketogenic diet, VNS, or corpus callosotomy. “If parents think the diet can be tolerated, trying it first may not be a bad option,” he said.
—Jake Remaly
WASHINGTON, DC—Many patients with pharmacoresistant epilepsy may benefit from nondrug treatments, including vagus nerve stimulation (VNS), the ketogenic diet, and corpus callosotomy, according to a study presented at the 71st Annual Meeting of the American Epilepsy Society. The treatments may reduce generalized and focal seizures, and most parents whose children underwent these procedures would opt for the same treatment under similar circumstances, the researchers said.
About 20% to 30% of patients have pharmacoresistant epilepsy. The ketogenic diet, corpus callosotomy, and VNS have been studied as alternatives to antiepileptic drugs (AEDs) for these patients, but few studies have compared the modalities.
Dave F. Clarke, MD, MBBS, Professor of Pediatric Neurology at the Baylor College of Medicine and Clinical Director of Epilepsy at Texas Children’s Hospital in Houston, and colleagues compared seizure control, cognitive and behavioral factors, quality of life, and parent satisfaction among patients who received VNS, underwent corpus callosotomy, or initiated the ketogenic diet. They identified 336 patients who had received one of these treatments at Dell Children’s Medical Center of Central Texas in Austin between January 2010 and November 2015. Parents of 210 of the patients completed a nine-item telephone survey.
Of the 210 patients whose parents completed the survey, 98 (33.6%) had initiated the ketogenic diet, 150 (51.4%) had received VNS, and 44 (15.1%) had undergone corpus callosotomy. Patients were between the ages of 8 months and 20 years. Patients who had initiated the ketogenic diet had a mean age of about 7, and patients who received VNS or underwent corpus callosotomy had a mean age of about 10. Patients had failed more than three AEDs on average (range, two to 13).
Parents reported a 50% or greater reduction in generalized seizures in 63% of patients who went on the ketogenic diet, 54% of patients who underwent corpus callosotomy, and 52% of patients who received VNS. Parents reported a 50% or greater reduction in focal seizures in 56% of children who went on the ketogenic diet, 56% of patients who had corpus callosotomy, and 53% of patients who received VNS.
In addition, parents reported improved quality of life in 48% of patients on the ketogenic diet, 63% of patients who had corpus callosotomy, and 44% of patients who received VNS. Overall, 80% of parents whose children were on the ketogenic diet or received VNS and 75% of parents whose children underwent corpus callosotomy reported that they were satisfied with the treatment that their child had received.
“Higher health-related quality of life after intervention was predicted by improved behavior, increased engagement, diminished frequency of atonic or generalized tonic-clonic seizures, and reduction in epilepsy-related injuries,” the researchers concluded. Parents were more likely to say that they would repeat the procedure if, after the treatment, “their child was more engaged, had diminished frequency of atonic or generalized tonic-clonic seizures, and had a reduction in epilepsy-related injuries.”
“Unfortunately, many doctors keep trying medications without considering alternatives,” said Dr. Clarke. “Based on the parents’ feedback, I would suggest doctors introduce the concept of alternatives after two AEDs fail to control seizures.” If surgery to ablate or remove the area of the brain where seizures originate is not an option, neurologists should talk to parents about the ketogenic diet, VNS, or corpus callosotomy. “If parents think the diet can be tolerated, trying it first may not be a bad option,” he said.
—Jake Remaly
WASHINGTON, DC—Many patients with pharmacoresistant epilepsy may benefit from nondrug treatments, including vagus nerve stimulation (VNS), the ketogenic diet, and corpus callosotomy, according to a study presented at the 71st Annual Meeting of the American Epilepsy Society. The treatments may reduce generalized and focal seizures, and most parents whose children underwent these procedures would opt for the same treatment under similar circumstances, the researchers said.
About 20% to 30% of patients have pharmacoresistant epilepsy. The ketogenic diet, corpus callosotomy, and VNS have been studied as alternatives to antiepileptic drugs (AEDs) for these patients, but few studies have compared the modalities.
Dave F. Clarke, MD, MBBS, Professor of Pediatric Neurology at the Baylor College of Medicine and Clinical Director of Epilepsy at Texas Children’s Hospital in Houston, and colleagues compared seizure control, cognitive and behavioral factors, quality of life, and parent satisfaction among patients who received VNS, underwent corpus callosotomy, or initiated the ketogenic diet. They identified 336 patients who had received one of these treatments at Dell Children’s Medical Center of Central Texas in Austin between January 2010 and November 2015. Parents of 210 of the patients completed a nine-item telephone survey.
Of the 210 patients whose parents completed the survey, 98 (33.6%) had initiated the ketogenic diet, 150 (51.4%) had received VNS, and 44 (15.1%) had undergone corpus callosotomy. Patients were between the ages of 8 months and 20 years. Patients who had initiated the ketogenic diet had a mean age of about 7, and patients who received VNS or underwent corpus callosotomy had a mean age of about 10. Patients had failed more than three AEDs on average (range, two to 13).
Parents reported a 50% or greater reduction in generalized seizures in 63% of patients who went on the ketogenic diet, 54% of patients who underwent corpus callosotomy, and 52% of patients who received VNS. Parents reported a 50% or greater reduction in focal seizures in 56% of children who went on the ketogenic diet, 56% of patients who had corpus callosotomy, and 53% of patients who received VNS.
In addition, parents reported improved quality of life in 48% of patients on the ketogenic diet, 63% of patients who had corpus callosotomy, and 44% of patients who received VNS. Overall, 80% of parents whose children were on the ketogenic diet or received VNS and 75% of parents whose children underwent corpus callosotomy reported that they were satisfied with the treatment that their child had received.
“Higher health-related quality of life after intervention was predicted by improved behavior, increased engagement, diminished frequency of atonic or generalized tonic-clonic seizures, and reduction in epilepsy-related injuries,” the researchers concluded. Parents were more likely to say that they would repeat the procedure if, after the treatment, “their child was more engaged, had diminished frequency of atonic or generalized tonic-clonic seizures, and had a reduction in epilepsy-related injuries.”
“Unfortunately, many doctors keep trying medications without considering alternatives,” said Dr. Clarke. “Based on the parents’ feedback, I would suggest doctors introduce the concept of alternatives after two AEDs fail to control seizures.” If surgery to ablate or remove the area of the brain where seizures originate is not an option, neurologists should talk to parents about the ketogenic diet, VNS, or corpus callosotomy. “If parents think the diet can be tolerated, trying it first may not be a bad option,” he said.
—Jake Remaly
Homelessness: Whose job is it?
Despite programs to end homelessness, it remains a substantial and growing problem in many cities in the United States.1,2 In 2016, there were an estimated 10,550 homeless people living in my home state of Colorado, a 6% increase from the prior year.2 A recent point-estimate study found that there were more than 5,000 homeless individuals in the Denver metropolitan area on a single night in January 2017.3 Because of the relative scarcity of housing, a growing number of cities like Denver now utilize a practice known as vulnerability indexing to prioritize homeless persons at high risk of mortality from medical conditions for placement in permanent supportive housing.4
Although hospitalists like myself frequently care for vulnerable homeless patients in the hospital, most have little formal training in how best to care for and advocate for these individuals beyond treating their acute medical need, and little direct contact with community organizations with expertise in doing so. Instead, we have learned informally through experience. Hospital providers are often frustrated by the perceived lack of services and support available to these patients, and there is substantial variability in the extent to which providers engage patients and community partners during and after hospitalization. Despite the growing practice of vulnerability indexing in the community, hospital-based providers do not routinely assess vulnerability with respect to housing. Previous research indicates that housing status is assessed in only a minority of homeless patients during their hospital stay.12 Thus, hospitalization often represents a missed opportunity to identify vulnerability and utilize it to connect patients with housing and other resources.
Addressing the significant known health disparities faced by homeless persons is one of the greatest health equity challenges of our time.13 We need better ways of understanding, identifying, and addressing vulnerability among homeless patients who are hospitalized, paired with improved integration with local community organizations. This will require moving beyond the idea that homelessness is the social worker’s job to one of shared responsibility and advocacy.
Collaborative research and other partnerships that engage both community organizations and individuals affected by homelessness are crucial to further understand the specific needs, barriers, challenges, and opportunities for improving hospital care and care transitions in this population. As well-respected community members and systems thinkers who witness these inequities on a daily basis, hospitalists are well positioned to help lead this work.
Dr. Stella is a hospitalist at Denver Health and Hospital Authority, and an associate professor of medicine at the University of Colorado. She is a member of The Hospitalist editorial advisory board.
References
1. Ending Chronic Homelessness. (Aug 2017). U.S. Interagency Council on Homelessness. Available at: https://www.usich.gov/goals/chronicsness. Accessed: Oct 21, 2017.
2. 2016 Annual Homeless Assessment Report (AHAR) to Congress. (Nov 2016). U.S. Department of Housing and Urban Development Office of Community Planning and Development, Part 1. Available at: https://www.hudexchange.info/resources/documents/2016-AHAR-Part-1.pdf. Accessed: Oct 21, 2017.
3. 2017 Point-In-Time Report, Seven-County Metro Denver Region. Metro Denver Homeless Initiative. Available at: http://www.mdhi.org/2017_pit. Accessed Oct 22, 2017.
4. Henwood BF et al. Examining mortality among formerly homeless adults enrolled in Housing First: An observational study. BMC Public Health. 2015;15:1209.
5. Weinstein LC et al. Moving from street to home: Health status of entrants to a Housing First program. J Prim Care Community Health. 2011;2:11–5.
6. Kushel MB et al. Factors associated with the health care utilization of homeless persons. JAMA. 2001;285(2):200-6.
7. Kushel MB et al. Emergency department use among the homeless and marginally housed: Results from a community-based study. Am J Public Health. 2002;92(5):778-84.
8. Baggett TP et al. Mortality among homeless adults in Boston: Shifts in causes of death over a 15-year period. JAMA Intern Med. 2013 Feb 11;173(3):189–95.
9. Johnson et al. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Aff (Millwood). 2015 Aug;34(8):1312-9.
10. Durfee J et al. The impact of tailored intervention services on charges and mortality for adult super-utilizers. Healthc (Amst). 2017 Aug 25. pii: S2213-0764(17)30057-X. doi: 10.1016/j.hjdsi.2017.08.004. [Epub ahead of print]
11. Rinehart DJ et al. Identifying subgroups of adult super utilizers in an urban safety-net system using latent class analysis: Implications for clinical practice. Med Care. 2016 Sep 14. doi: 10.1097/MLR.0000000000000628. [Epub ahead of print]
12. Greysen RS et al. Understanding transitions of care from hospital to homeless shelter: A mixed-methods, community-based participatory approach. J Gen Intern Med. 2012;27(11):1484-91.
13. National Health Care for the Homeless Council. (Oct 2012). Improving Care Transitions for People Experiencing Homelessness. (Lead author: Sabrina Edgington, policy and program specialist.) Available at: www.nhchc.org/wp-content/uploads/2012/12/Policy_Brief_Care_Transitions.pdf. Accessed Oct 21, 2017.
14. Koh HK et al. Improving healthcare for homeless people. JAMA. 2016;316(24):2586-7.
Despite programs to end homelessness, it remains a substantial and growing problem in many cities in the United States.1,2 In 2016, there were an estimated 10,550 homeless people living in my home state of Colorado, a 6% increase from the prior year.2 A recent point-estimate study found that there were more than 5,000 homeless individuals in the Denver metropolitan area on a single night in January 2017.3 Because of the relative scarcity of housing, a growing number of cities like Denver now utilize a practice known as vulnerability indexing to prioritize homeless persons at high risk of mortality from medical conditions for placement in permanent supportive housing.4
Although hospitalists like myself frequently care for vulnerable homeless patients in the hospital, most have little formal training in how best to care for and advocate for these individuals beyond treating their acute medical need, and little direct contact with community organizations with expertise in doing so. Instead, we have learned informally through experience. Hospital providers are often frustrated by the perceived lack of services and support available to these patients, and there is substantial variability in the extent to which providers engage patients and community partners during and after hospitalization. Despite the growing practice of vulnerability indexing in the community, hospital-based providers do not routinely assess vulnerability with respect to housing. Previous research indicates that housing status is assessed in only a minority of homeless patients during their hospital stay.12 Thus, hospitalization often represents a missed opportunity to identify vulnerability and utilize it to connect patients with housing and other resources.
Addressing the significant known health disparities faced by homeless persons is one of the greatest health equity challenges of our time.13 We need better ways of understanding, identifying, and addressing vulnerability among homeless patients who are hospitalized, paired with improved integration with local community organizations. This will require moving beyond the idea that homelessness is the social worker’s job to one of shared responsibility and advocacy.
Collaborative research and other partnerships that engage both community organizations and individuals affected by homelessness are crucial to further understand the specific needs, barriers, challenges, and opportunities for improving hospital care and care transitions in this population. As well-respected community members and systems thinkers who witness these inequities on a daily basis, hospitalists are well positioned to help lead this work.
Dr. Stella is a hospitalist at Denver Health and Hospital Authority, and an associate professor of medicine at the University of Colorado. She is a member of The Hospitalist editorial advisory board.
References
1. Ending Chronic Homelessness. (Aug 2017). U.S. Interagency Council on Homelessness. Available at: https://www.usich.gov/goals/chronicsness. Accessed: Oct 21, 2017.
2. 2016 Annual Homeless Assessment Report (AHAR) to Congress. (Nov 2016). U.S. Department of Housing and Urban Development Office of Community Planning and Development, Part 1. Available at: https://www.hudexchange.info/resources/documents/2016-AHAR-Part-1.pdf. Accessed: Oct 21, 2017.
3. 2017 Point-In-Time Report, Seven-County Metro Denver Region. Metro Denver Homeless Initiative. Available at: http://www.mdhi.org/2017_pit. Accessed Oct 22, 2017.
4. Henwood BF et al. Examining mortality among formerly homeless adults enrolled in Housing First: An observational study. BMC Public Health. 2015;15:1209.
5. Weinstein LC et al. Moving from street to home: Health status of entrants to a Housing First program. J Prim Care Community Health. 2011;2:11–5.
6. Kushel MB et al. Factors associated with the health care utilization of homeless persons. JAMA. 2001;285(2):200-6.
7. Kushel MB et al. Emergency department use among the homeless and marginally housed: Results from a community-based study. Am J Public Health. 2002;92(5):778-84.
8. Baggett TP et al. Mortality among homeless adults in Boston: Shifts in causes of death over a 15-year period. JAMA Intern Med. 2013 Feb 11;173(3):189–95.
9. Johnson et al. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Aff (Millwood). 2015 Aug;34(8):1312-9.
10. Durfee J et al. The impact of tailored intervention services on charges and mortality for adult super-utilizers. Healthc (Amst). 2017 Aug 25. pii: S2213-0764(17)30057-X. doi: 10.1016/j.hjdsi.2017.08.004. [Epub ahead of print]
11. Rinehart DJ et al. Identifying subgroups of adult super utilizers in an urban safety-net system using latent class analysis: Implications for clinical practice. Med Care. 2016 Sep 14. doi: 10.1097/MLR.0000000000000628. [Epub ahead of print]
12. Greysen RS et al. Understanding transitions of care from hospital to homeless shelter: A mixed-methods, community-based participatory approach. J Gen Intern Med. 2012;27(11):1484-91.
13. National Health Care for the Homeless Council. (Oct 2012). Improving Care Transitions for People Experiencing Homelessness. (Lead author: Sabrina Edgington, policy and program specialist.) Available at: www.nhchc.org/wp-content/uploads/2012/12/Policy_Brief_Care_Transitions.pdf. Accessed Oct 21, 2017.
14. Koh HK et al. Improving healthcare for homeless people. JAMA. 2016;316(24):2586-7.
Despite programs to end homelessness, it remains a substantial and growing problem in many cities in the United States.1,2 In 2016, there were an estimated 10,550 homeless people living in my home state of Colorado, a 6% increase from the prior year.2 A recent point-estimate study found that there were more than 5,000 homeless individuals in the Denver metropolitan area on a single night in January 2017.3 Because of the relative scarcity of housing, a growing number of cities like Denver now utilize a practice known as vulnerability indexing to prioritize homeless persons at high risk of mortality from medical conditions for placement in permanent supportive housing.4
Although hospitalists like myself frequently care for vulnerable homeless patients in the hospital, most have little formal training in how best to care for and advocate for these individuals beyond treating their acute medical need, and little direct contact with community organizations with expertise in doing so. Instead, we have learned informally through experience. Hospital providers are often frustrated by the perceived lack of services and support available to these patients, and there is substantial variability in the extent to which providers engage patients and community partners during and after hospitalization. Despite the growing practice of vulnerability indexing in the community, hospital-based providers do not routinely assess vulnerability with respect to housing. Previous research indicates that housing status is assessed in only a minority of homeless patients during their hospital stay.12 Thus, hospitalization often represents a missed opportunity to identify vulnerability and utilize it to connect patients with housing and other resources.
Addressing the significant known health disparities faced by homeless persons is one of the greatest health equity challenges of our time.13 We need better ways of understanding, identifying, and addressing vulnerability among homeless patients who are hospitalized, paired with improved integration with local community organizations. This will require moving beyond the idea that homelessness is the social worker’s job to one of shared responsibility and advocacy.
Collaborative research and other partnerships that engage both community organizations and individuals affected by homelessness are crucial to further understand the specific needs, barriers, challenges, and opportunities for improving hospital care and care transitions in this population. As well-respected community members and systems thinkers who witness these inequities on a daily basis, hospitalists are well positioned to help lead this work.
Dr. Stella is a hospitalist at Denver Health and Hospital Authority, and an associate professor of medicine at the University of Colorado. She is a member of The Hospitalist editorial advisory board.
References
1. Ending Chronic Homelessness. (Aug 2017). U.S. Interagency Council on Homelessness. Available at: https://www.usich.gov/goals/chronicsness. Accessed: Oct 21, 2017.
2. 2016 Annual Homeless Assessment Report (AHAR) to Congress. (Nov 2016). U.S. Department of Housing and Urban Development Office of Community Planning and Development, Part 1. Available at: https://www.hudexchange.info/resources/documents/2016-AHAR-Part-1.pdf. Accessed: Oct 21, 2017.
3. 2017 Point-In-Time Report, Seven-County Metro Denver Region. Metro Denver Homeless Initiative. Available at: http://www.mdhi.org/2017_pit. Accessed Oct 22, 2017.
4. Henwood BF et al. Examining mortality among formerly homeless adults enrolled in Housing First: An observational study. BMC Public Health. 2015;15:1209.
5. Weinstein LC et al. Moving from street to home: Health status of entrants to a Housing First program. J Prim Care Community Health. 2011;2:11–5.
6. Kushel MB et al. Factors associated with the health care utilization of homeless persons. JAMA. 2001;285(2):200-6.
7. Kushel MB et al. Emergency department use among the homeless and marginally housed: Results from a community-based study. Am J Public Health. 2002;92(5):778-84.
8. Baggett TP et al. Mortality among homeless adults in Boston: Shifts in causes of death over a 15-year period. JAMA Intern Med. 2013 Feb 11;173(3):189–95.
9. Johnson et al. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Aff (Millwood). 2015 Aug;34(8):1312-9.
10. Durfee J et al. The impact of tailored intervention services on charges and mortality for adult super-utilizers. Healthc (Amst). 2017 Aug 25. pii: S2213-0764(17)30057-X. doi: 10.1016/j.hjdsi.2017.08.004. [Epub ahead of print]
11. Rinehart DJ et al. Identifying subgroups of adult super utilizers in an urban safety-net system using latent class analysis: Implications for clinical practice. Med Care. 2016 Sep 14. doi: 10.1097/MLR.0000000000000628. [Epub ahead of print]
12. Greysen RS et al. Understanding transitions of care from hospital to homeless shelter: A mixed-methods, community-based participatory approach. J Gen Intern Med. 2012;27(11):1484-91.
13. National Health Care for the Homeless Council. (Oct 2012). Improving Care Transitions for People Experiencing Homelessness. (Lead author: Sabrina Edgington, policy and program specialist.) Available at: www.nhchc.org/wp-content/uploads/2012/12/Policy_Brief_Care_Transitions.pdf. Accessed Oct 21, 2017.
14. Koh HK et al. Improving healthcare for homeless people. JAMA. 2016;316(24):2586-7.
Continue to opt for HDT/ASCT for multiple myeloma
High-dose therapy with melphalan followed by autologous stem cell transplant (HDT/ASCT) is still the best option for multiple myeloma even after almost 2 decades with newer and highly effective induction agents, according to a recent systematic review and two meta-analyses.
Given the “unprecedented efficacy” of “modern induction therapy with immunomodulatory drugs and proteasome inhibitors (also called ‘novel agents’),” investigators “have sought to reevaluate the role of HDT/ASCT,” wrote Binod Dhakal, MD, of the Medical College of Wisconsin, and his colleagues. The report is in JAMA Oncology.
To solve the issue, they analyzed five randomized controlled trials conducted since 2000 and concluded that HDT/ASCT is still the preferred treatment approach.
Despite a lack of demonstrable overall survival benefit, there is a significant progression-free survival (PFS) benefit, low treatment-related mortality, and potential high minimal residual disease-negative rates conferred by HDT/ASCT in newly-diagnosed multiple myeloma, the researchers noted.
The combined odds for complete response were 1.27 (95% confidence interval, 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT). The combined hazard ratio (HR) for PFS was 0.55 (95% CI, 0.41-0.7, P less than .001) and 0.76 for overall survival (95% CI, 0.42-1.36, P = .20) in favor of HDT.
PFS was best with tandem HDT/ASCT (HR, 0.49, 95% CI, 0.37-0.65) followed by single HDT/ASCT with bortezomib, lenalidomide, and dexamethasone consolidation (HR, 0.53, 95% CI, 0.37-0.76) and single HDT/ASCT alone (HR, 0.68, 95% CI, 0.53-0.87), compared with SDT. However, none of the HDT/ASCT approaches had a significant impact on overall survival.
Meanwhile, treatment-related mortality with HDT/ASCT was minimal, at less than 1%.
“The achievement of high [minimal residual disease] rates with HDT/ASCT may render this approach the ideal platform for testing novel approaches (e.g., immunotherapy) aiming at disease eradication and cures,” the researchers wrote.
The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
SOURCE: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
High-dose therapy with melphalan followed by autologous stem cell transplant (HDT/ASCT) is still the best option for multiple myeloma even after almost 2 decades with newer and highly effective induction agents, according to a recent systematic review and two meta-analyses.
Given the “unprecedented efficacy” of “modern induction therapy with immunomodulatory drugs and proteasome inhibitors (also called ‘novel agents’),” investigators “have sought to reevaluate the role of HDT/ASCT,” wrote Binod Dhakal, MD, of the Medical College of Wisconsin, and his colleagues. The report is in JAMA Oncology.
To solve the issue, they analyzed five randomized controlled trials conducted since 2000 and concluded that HDT/ASCT is still the preferred treatment approach.
Despite a lack of demonstrable overall survival benefit, there is a significant progression-free survival (PFS) benefit, low treatment-related mortality, and potential high minimal residual disease-negative rates conferred by HDT/ASCT in newly-diagnosed multiple myeloma, the researchers noted.
The combined odds for complete response were 1.27 (95% confidence interval, 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT). The combined hazard ratio (HR) for PFS was 0.55 (95% CI, 0.41-0.7, P less than .001) and 0.76 for overall survival (95% CI, 0.42-1.36, P = .20) in favor of HDT.
PFS was best with tandem HDT/ASCT (HR, 0.49, 95% CI, 0.37-0.65) followed by single HDT/ASCT with bortezomib, lenalidomide, and dexamethasone consolidation (HR, 0.53, 95% CI, 0.37-0.76) and single HDT/ASCT alone (HR, 0.68, 95% CI, 0.53-0.87), compared with SDT. However, none of the HDT/ASCT approaches had a significant impact on overall survival.
Meanwhile, treatment-related mortality with HDT/ASCT was minimal, at less than 1%.
“The achievement of high [minimal residual disease] rates with HDT/ASCT may render this approach the ideal platform for testing novel approaches (e.g., immunotherapy) aiming at disease eradication and cures,” the researchers wrote.
The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
SOURCE: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
High-dose therapy with melphalan followed by autologous stem cell transplant (HDT/ASCT) is still the best option for multiple myeloma even after almost 2 decades with newer and highly effective induction agents, according to a recent systematic review and two meta-analyses.
Given the “unprecedented efficacy” of “modern induction therapy with immunomodulatory drugs and proteasome inhibitors (also called ‘novel agents’),” investigators “have sought to reevaluate the role of HDT/ASCT,” wrote Binod Dhakal, MD, of the Medical College of Wisconsin, and his colleagues. The report is in JAMA Oncology.
To solve the issue, they analyzed five randomized controlled trials conducted since 2000 and concluded that HDT/ASCT is still the preferred treatment approach.
Despite a lack of demonstrable overall survival benefit, there is a significant progression-free survival (PFS) benefit, low treatment-related mortality, and potential high minimal residual disease-negative rates conferred by HDT/ASCT in newly-diagnosed multiple myeloma, the researchers noted.
The combined odds for complete response were 1.27 (95% confidence interval, 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT). The combined hazard ratio (HR) for PFS was 0.55 (95% CI, 0.41-0.7, P less than .001) and 0.76 for overall survival (95% CI, 0.42-1.36, P = .20) in favor of HDT.
PFS was best with tandem HDT/ASCT (HR, 0.49, 95% CI, 0.37-0.65) followed by single HDT/ASCT with bortezomib, lenalidomide, and dexamethasone consolidation (HR, 0.53, 95% CI, 0.37-0.76) and single HDT/ASCT alone (HR, 0.68, 95% CI, 0.53-0.87), compared with SDT. However, none of the HDT/ASCT approaches had a significant impact on overall survival.
Meanwhile, treatment-related mortality with HDT/ASCT was minimal, at less than 1%.
“The achievement of high [minimal residual disease] rates with HDT/ASCT may render this approach the ideal platform for testing novel approaches (e.g., immunotherapy) aiming at disease eradication and cures,” the researchers wrote.
The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
SOURCE: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
FROM JAMA ONCOLOGY
Key clinical point:
Major finding: The combined odds for complete response were 1.27 (95% CI 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT).
Study details: A systematic review and two meta-analyses examining five phase 3 clinical trials reported since 2000.
Disclosures: The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
Source: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
Folic acid and multivitamin supplements associated with reduced autism risk
Taking folic acid and/or multivitamin supplements preceding and during pregnancy is associated with a lower risk of offspring developing autism spectrum disorder (ASD), an observational epidemiologic study published Jan. 3 showed.
The findings could have important public health implications, reported Stephen Z. Levine, PhD, and his associates.
The investigators found that 572 children, or 1.3%, received an ASD diagnosis. Dr. Levine and his associates found that children whose mothers took folic acid and multivitamin supplements during pregnancy had a lower risk of developing ASD (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001), compared with those whose mothers took no supplements. Similarly, there was reduced risk among those whose mothers took only folic acid during pregnancy (RR, 0.32; CI, 0.26-0.41; P less than .001) or only multivitamins (RR, 0.35; CI, 0.28-0.44; P less than .001). Likewise, lower risks were seen among offspring whose mothers took supplements before pregnancy: Compared with no supplements, the RR was 0.39 for folic acid and/or multivitamins (CI, 0.30-0.50; P less than .001), 0.56 for just folic acid (95%CI, 0.42-0.74; P = .001), and 0.36 for just multivitamins (95%CI, 0.24-0.52; P less than .001). Similar associations were found among male and female offspring.
“This finding may reflect noncompliance, higher rates of vitamin deficiency, or poor diet among persons with psychiatric conditions,” wrote Dr. Levine, of the department of community mental health at the University of Haifa, Israel, and his associates in JAMA Psychiatry.
Another important finding is that maternal exposure to folic acid and multivitamin supplements 2 years before pregnancy is tied to a lower ASD risk.
The investigators acknowledged that the study was limited by their inability to determine possible confounding factors, such as the vehicle of vitamin dispensations, use of over-the-counter supplements, false-positive classifications from noncompliance, and absence of information on gestational age. In addition, they said, “causality cannot be inferred from observational studies such as this one.” In light of those limitations, investigators said, additional studies replicating these findings are needed.
The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
SOURCE: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.
Taking folic acid and/or multivitamin supplements preceding and during pregnancy is associated with a lower risk of offspring developing autism spectrum disorder (ASD), an observational epidemiologic study published Jan. 3 showed.
The findings could have important public health implications, reported Stephen Z. Levine, PhD, and his associates.
The investigators found that 572 children, or 1.3%, received an ASD diagnosis. Dr. Levine and his associates found that children whose mothers took folic acid and multivitamin supplements during pregnancy had a lower risk of developing ASD (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001), compared with those whose mothers took no supplements. Similarly, there was reduced risk among those whose mothers took only folic acid during pregnancy (RR, 0.32; CI, 0.26-0.41; P less than .001) or only multivitamins (RR, 0.35; CI, 0.28-0.44; P less than .001). Likewise, lower risks were seen among offspring whose mothers took supplements before pregnancy: Compared with no supplements, the RR was 0.39 for folic acid and/or multivitamins (CI, 0.30-0.50; P less than .001), 0.56 for just folic acid (95%CI, 0.42-0.74; P = .001), and 0.36 for just multivitamins (95%CI, 0.24-0.52; P less than .001). Similar associations were found among male and female offspring.
“This finding may reflect noncompliance, higher rates of vitamin deficiency, or poor diet among persons with psychiatric conditions,” wrote Dr. Levine, of the department of community mental health at the University of Haifa, Israel, and his associates in JAMA Psychiatry.
Another important finding is that maternal exposure to folic acid and multivitamin supplements 2 years before pregnancy is tied to a lower ASD risk.
The investigators acknowledged that the study was limited by their inability to determine possible confounding factors, such as the vehicle of vitamin dispensations, use of over-the-counter supplements, false-positive classifications from noncompliance, and absence of information on gestational age. In addition, they said, “causality cannot be inferred from observational studies such as this one.” In light of those limitations, investigators said, additional studies replicating these findings are needed.
The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
SOURCE: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.
Taking folic acid and/or multivitamin supplements preceding and during pregnancy is associated with a lower risk of offspring developing autism spectrum disorder (ASD), an observational epidemiologic study published Jan. 3 showed.
The findings could have important public health implications, reported Stephen Z. Levine, PhD, and his associates.
The investigators found that 572 children, or 1.3%, received an ASD diagnosis. Dr. Levine and his associates found that children whose mothers took folic acid and multivitamin supplements during pregnancy had a lower risk of developing ASD (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001), compared with those whose mothers took no supplements. Similarly, there was reduced risk among those whose mothers took only folic acid during pregnancy (RR, 0.32; CI, 0.26-0.41; P less than .001) or only multivitamins (RR, 0.35; CI, 0.28-0.44; P less than .001). Likewise, lower risks were seen among offspring whose mothers took supplements before pregnancy: Compared with no supplements, the RR was 0.39 for folic acid and/or multivitamins (CI, 0.30-0.50; P less than .001), 0.56 for just folic acid (95%CI, 0.42-0.74; P = .001), and 0.36 for just multivitamins (95%CI, 0.24-0.52; P less than .001). Similar associations were found among male and female offspring.
“This finding may reflect noncompliance, higher rates of vitamin deficiency, or poor diet among persons with psychiatric conditions,” wrote Dr. Levine, of the department of community mental health at the University of Haifa, Israel, and his associates in JAMA Psychiatry.
Another important finding is that maternal exposure to folic acid and multivitamin supplements 2 years before pregnancy is tied to a lower ASD risk.
The investigators acknowledged that the study was limited by their inability to determine possible confounding factors, such as the vehicle of vitamin dispensations, use of over-the-counter supplements, false-positive classifications from noncompliance, and absence of information on gestational age. In addition, they said, “causality cannot be inferred from observational studies such as this one.” In light of those limitations, investigators said, additional studies replicating these findings are needed.
The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
SOURCE: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.
Key clinical point: Taking folic acid and multivitamin supplements before and during pregnancy can reduce risk of autism in children.
Major finding: Children whose mothers took folic acid and/or multivitamin supplements during pregnancy had a decreased risk of developing ASD, compared with those whose mothers did not (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001).
Study details: Observational epidemiologic study of 45,300 Israeli children born between January 2003 and December 2007 and followed until January 2015.
Disclosures: The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
Source: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.