Article Type
Changed
Fri, 04/24/2020 - 09:39
Display Headline
Can the Use of Siri, Alexa, and Google Assistant for Medical Information Result in Patient Harm?

Study Overview

Objective. To determine the prevalence and nature of the harm that could result from patients or consumers using conversational assistants for medical information.

Design. Observational study.

Settings and participants. Participants were recruited from an online job posting site and were eligible if they were aged ≥ 21 years and were native speakers of English. There were no other eligibility requirements. Participants contacted a research assistant by phone or email, and eligibility was confirmed before scheduling the study visit and again after arrival. However, data from 4 participants was excluded after the participants disclosed that they were not native English speakers at the end of their study sessions. Participants were compensated for their time.

Each participant took part in a single 60-minute usability session. Following informed consent and administration of baseline questionnaires, each was assigned a random selection of 2 medication tasks and 1 emergency task (provided as written scenarios) to perform with each conversational assistant—Siri, Alexa, and Google Assistant—with the order of assistants and tasks counterbalanced. Before the participants completed their first task with each conversational assistant, the research assistant demonstrated how to activate the conversational assistant using a standard weather-related question, after which the participant was asked to think of a health-related question and given 5 minutes to practice interacting with the conversational assistant with their question. Participants were then asked to complete the 3 tasks in sequence, querying the conversational assistant in their own words. Tasks were considered completed either when participants stated that they had found an answer to the question or when 5 minutes had elapsed. At task completion, the research assistant asked the participant what they would do next given the information obtained during the interaction with the conversational assistant. After the participant completed the third task with a given conversational assistant, the research assistant administered the satisfaction questionnaire. After a participant finished interacting with all 3 conversational assistants, they were interviewed about their experience.

Measures and analysis. Interactions with conversational assistants were video recorded, with the audio transcribed for analysis. Since each task typically took multiple attempts before resolution or the participant gave up, usability metrics were coded at both the task and attempt level, including time, outcomes, and error analysis. Participant-reported actions for each medical task were rated for patient harm by 2 judges (an internist and a pharmacist) using a scale adapted from those used by the Agency for Healthcare Research and Quality and the US Food and Drug Administration. Scoring was based on the following values: 0 for no harm; 1 for mild harm, resulting in bodily or psychological injury; 2 for moderate harm, resulting in bodily or psychological injury adversely affecting the functional ability or quality of life; 3 for severe harm, resulting in bodily or psychological injury, including pain or disfigurement, that interferes substantially with functional ability or quality of life; and 4 was awarded in the event of death. The 2 judges first assigned ratings independently, then met to reach consensus on cases where they disagreed. Every harmful outcome was then analyzed to determine the type of error and cause of the outcome (user error, system error, or both). The satisfaction questionnaire included 6 self-report items with response values on a 7-point scale ranging from “Not at all” to “Very satisfied.”

Main results. 54 participants completed the study, with a mean age of 42 years (SD 18) and a higher representation of individuals in the 21- to 24-year-old category than the general US adult population (30% compared to 14%). Twenty-nine (54%) were female, 31 (57%) Caucasian, and 26 (50%) college educated. Most (52 [96%]) had high levels of health literacy. Only 8 (15%) reported using a conversational assistant regularly, while 22 (41%) had never used one, and 24 (44%) had tried one “a few times.” Forty-four (82%) used computers regularly.

Of the 168 tasks completed with reported actions, 49 (29.2%) could have resulted in some degree of harm, including 27 (16.1%) that could have resulted in death. An analysis of 44 cases that potentially resulted in harm yielded several recurring error scenarios, with blame attributed solely to the conversational assistant in 13 (30%) cases, to the user in 20 (46%) cases, and to both the user and the conversational assistant in the remaining 11 (25%) cases. The most common harm scenario (9 cases, (21%) is one where the participant fails to provide all the information in the task description, and the conversational assistant responds correctly to the partial query, which the user then accepts as the recommended action to take. The next most common type of harm scenario occurs when the participant provides a complete and correct utterance describing the problem and the conversational assistant responds with partial information (7 cases, 16%). Overall self-reported satisfaction with conversational assistants was neutral, with a median rating of 4 (IQR 1-6).

 

 

Outcomes by conversational assistant were significantly different (X24 = 132.2, P < 0.001). Alexa failed for most tasks (125/394 [91.9%]), resulting in significantly more attempts made but significantly fewer instances in which responses could lead to harm. Siri had the highest task completion rate (365 [77.6%]), in part because it typically displayed a list of web pages in its response that provided at least some information to the participant. However, because of this, it had the highest likelihood of causing harm for the tasks tested (27 [20.9%]). Median user satisfaction with the 3 conversational assistants was neutral, but with significant differences among them. Participants were least satisfied with Alexa and most satisfied with Siri, and stated they were most likely to follow the recommendations provided by Siri.

Qualitatively, most participants said they would use conversational assistants for medical information, but many felt they were not quite up to the task yet. When asked about their trust in the results provided by the conversational assistants, participants said they trusted Siri the most because it provided links to multiple websites in response to their queries, allowing them to choose the response that most closely matched their assumptions. They also appreciated that Siri provided a display of its speech recognition results, giving them more confidence in its responses, and allowing them to modify their query if needed. Many participants expressed frustration with the systems, but particularly Alexa.

Conclusion. Reliance on conversational assistants for actionable medical information represents a safety risk for patients and consumers. Patients should be cautioned to not use these technologies for answers to medical questions they intend to act on without further consultation from a health care provider.

 

Commentary

Roughly 9 in 10 American adults use the Internet,1 with the ability to easily access information through a variety of devices including smartphones, tablets, and laptop computers. This ease of access to information has played an important role in shifting how individuals access health information and interact with their health care provider.2,3 Online health information can increase patients’ knowledge of, competence with, and engagement in health care decision-making strategies. Online health information seeking can also complement and be used in synergy with provider-patient interactions. However, online health information is difficult to regulate, complicated further by the wide range of health information literacy among patients. Inaccurate or misleading health information can lead patients to make detrimental or even dangerous health decisions. These benefits and concerns similarly apply to conversational assistants like Siri (Apple), Alexa (Amazon), and Google Assistant, which are increasingly being used by patients and consumers to access medical- and health-related information. As these technologies are voice-activated, they appear to address some health literacy limitations. However, they still pose important limitations and safety risks,4 especially as conversational assistants are being perceived as a trustworthy parallel to clinical assessment and counseling systems.5

There has been little systematic research to explore potential risks of these platforms, as well as systematically characterize error types and error rates. This study aimed to determine the capabilities of widely used, general-purpose conversational assistants in responding to a broad range of medical questions when asked by laypersons in their own words and sought to conduct a systematic evaluation of the potential harm that could result from patients or consumers acting on the resulting recommendations. The study authors found that when asked questions about situations that require medical expertise, conversational assistants failed more than half of the time and led study participants to report that they would take actions that could have resulted in harm or death. Further, the authors characterized several failure modes, including errors due to misrecognition of study participant queries, study participant misunderstanding of tasks and responses by the conversation assistant, and limited understanding of the capabilities of the assistants to understand user queries. This misalignment of expectations by users that assistants can follow conversations/discourse led to frustrating experiences by some study participants.

 

 

Not only do these findings make important contributions to the literature of health information–seeking behaviors and limitations via conversational assistants, the study design highlights relevant approaches to evaluating interactions between users and conversational assistants and other voice-activated platforms. The authors designed a range of everyday task scenarios that real-life users may be experiencing and that can lead to querying home or smartphone devices to seek health- or medical-related information. These scenarios were also written with a level of real-life complexity that incorporated multiple facts to be considered for a successful resolution and the potential of harmful consequences should the correct course of action not be taken. In addition, they allowed study participants to interpret these task scenarios and query the conversational assistants in their own words, which further aligned with how users would typically interact with their devices.

However, this study also had some limitations, which the authors highlighted. Eligibility was limited to only English-speakers and the study sample was skewed towards younger, more educated individuals with high health literacy. Combined with the small convenience sample used, findings may not be generalizable to other/broader populations and further studies are needed, especially to highlight potential differences in population subgroups (eg, race/ethnicity, age, health literacy).

Applications for Clinical Practice

Because of the increased prevalence of online health-information–seeking behaviors by patients, clinicians must be prepared to adequately address, and in some cases, educate patients on the accuracy or relevance of medical/health information they find. Conversational assistants pose an important risk in health care as they incorporate natural language interfaces that can simulate and be misinterpreted as counseling systems by patients. As the authors highlight, laypersons cannot know what the full, detailed capabilities of conversational assistants are, either concerning their medical expertise or the aspects of natural language dialogue the conversational assistants can handle. Therefore, it is critical that clinicians and other providers emphasize the limitations of these technologies to patients and that any medical recommendations should be confirmed with health care professionals before they are acted on.

Katrina F. Mateo, MPH

References

1. Pew Research Center. Demographics of Internet and Home Broadband Usage in the United States [online]. Accessed at: http://www.pewinternet.org/fact-sheet/internet-broadband/.

2. Tonsaker T, Bartlett G, Trpkov C. Health information on the Internet: gold mine or minefield? Can Fam Physician. 2014;60:407-408.

3. Tan SS-L, Goonawardene N. Internet health information seeking and the patient-physician relationship: a systematic review. J Med Internet Res. 2017;19:e9.

4. Chung H, Iorga M, Voas J, Lee S. Alexa, can I trust you? Computer (Long Beach Calif). 2017;50:100-104.

5. Miner AS, Milstein A, Hancock JT. Talking to machines about personal mental health problems. JAMA. 2017;318:1217.

Article PDF
Issue
Journal of Clinical Outcomes Management - 26(1)
Publications
Topics
Page Number
11-13
Sections
Article PDF
Article PDF

Study Overview

Objective. To determine the prevalence and nature of the harm that could result from patients or consumers using conversational assistants for medical information.

Design. Observational study.

Settings and participants. Participants were recruited from an online job posting site and were eligible if they were aged ≥ 21 years and were native speakers of English. There were no other eligibility requirements. Participants contacted a research assistant by phone or email, and eligibility was confirmed before scheduling the study visit and again after arrival. However, data from 4 participants was excluded after the participants disclosed that they were not native English speakers at the end of their study sessions. Participants were compensated for their time.

Each participant took part in a single 60-minute usability session. Following informed consent and administration of baseline questionnaires, each was assigned a random selection of 2 medication tasks and 1 emergency task (provided as written scenarios) to perform with each conversational assistant—Siri, Alexa, and Google Assistant—with the order of assistants and tasks counterbalanced. Before the participants completed their first task with each conversational assistant, the research assistant demonstrated how to activate the conversational assistant using a standard weather-related question, after which the participant was asked to think of a health-related question and given 5 minutes to practice interacting with the conversational assistant with their question. Participants were then asked to complete the 3 tasks in sequence, querying the conversational assistant in their own words. Tasks were considered completed either when participants stated that they had found an answer to the question or when 5 minutes had elapsed. At task completion, the research assistant asked the participant what they would do next given the information obtained during the interaction with the conversational assistant. After the participant completed the third task with a given conversational assistant, the research assistant administered the satisfaction questionnaire. After a participant finished interacting with all 3 conversational assistants, they were interviewed about their experience.

Measures and analysis. Interactions with conversational assistants were video recorded, with the audio transcribed for analysis. Since each task typically took multiple attempts before resolution or the participant gave up, usability metrics were coded at both the task and attempt level, including time, outcomes, and error analysis. Participant-reported actions for each medical task were rated for patient harm by 2 judges (an internist and a pharmacist) using a scale adapted from those used by the Agency for Healthcare Research and Quality and the US Food and Drug Administration. Scoring was based on the following values: 0 for no harm; 1 for mild harm, resulting in bodily or psychological injury; 2 for moderate harm, resulting in bodily or psychological injury adversely affecting the functional ability or quality of life; 3 for severe harm, resulting in bodily or psychological injury, including pain or disfigurement, that interferes substantially with functional ability or quality of life; and 4 was awarded in the event of death. The 2 judges first assigned ratings independently, then met to reach consensus on cases where they disagreed. Every harmful outcome was then analyzed to determine the type of error and cause of the outcome (user error, system error, or both). The satisfaction questionnaire included 6 self-report items with response values on a 7-point scale ranging from “Not at all” to “Very satisfied.”

Main results. 54 participants completed the study, with a mean age of 42 years (SD 18) and a higher representation of individuals in the 21- to 24-year-old category than the general US adult population (30% compared to 14%). Twenty-nine (54%) were female, 31 (57%) Caucasian, and 26 (50%) college educated. Most (52 [96%]) had high levels of health literacy. Only 8 (15%) reported using a conversational assistant regularly, while 22 (41%) had never used one, and 24 (44%) had tried one “a few times.” Forty-four (82%) used computers regularly.

Of the 168 tasks completed with reported actions, 49 (29.2%) could have resulted in some degree of harm, including 27 (16.1%) that could have resulted in death. An analysis of 44 cases that potentially resulted in harm yielded several recurring error scenarios, with blame attributed solely to the conversational assistant in 13 (30%) cases, to the user in 20 (46%) cases, and to both the user and the conversational assistant in the remaining 11 (25%) cases. The most common harm scenario (9 cases, (21%) is one where the participant fails to provide all the information in the task description, and the conversational assistant responds correctly to the partial query, which the user then accepts as the recommended action to take. The next most common type of harm scenario occurs when the participant provides a complete and correct utterance describing the problem and the conversational assistant responds with partial information (7 cases, 16%). Overall self-reported satisfaction with conversational assistants was neutral, with a median rating of 4 (IQR 1-6).

 

 

Outcomes by conversational assistant were significantly different (X24 = 132.2, P < 0.001). Alexa failed for most tasks (125/394 [91.9%]), resulting in significantly more attempts made but significantly fewer instances in which responses could lead to harm. Siri had the highest task completion rate (365 [77.6%]), in part because it typically displayed a list of web pages in its response that provided at least some information to the participant. However, because of this, it had the highest likelihood of causing harm for the tasks tested (27 [20.9%]). Median user satisfaction with the 3 conversational assistants was neutral, but with significant differences among them. Participants were least satisfied with Alexa and most satisfied with Siri, and stated they were most likely to follow the recommendations provided by Siri.

Qualitatively, most participants said they would use conversational assistants for medical information, but many felt they were not quite up to the task yet. When asked about their trust in the results provided by the conversational assistants, participants said they trusted Siri the most because it provided links to multiple websites in response to their queries, allowing them to choose the response that most closely matched their assumptions. They also appreciated that Siri provided a display of its speech recognition results, giving them more confidence in its responses, and allowing them to modify their query if needed. Many participants expressed frustration with the systems, but particularly Alexa.

Conclusion. Reliance on conversational assistants for actionable medical information represents a safety risk for patients and consumers. Patients should be cautioned to not use these technologies for answers to medical questions they intend to act on without further consultation from a health care provider.

 

Commentary

Roughly 9 in 10 American adults use the Internet,1 with the ability to easily access information through a variety of devices including smartphones, tablets, and laptop computers. This ease of access to information has played an important role in shifting how individuals access health information and interact with their health care provider.2,3 Online health information can increase patients’ knowledge of, competence with, and engagement in health care decision-making strategies. Online health information seeking can also complement and be used in synergy with provider-patient interactions. However, online health information is difficult to regulate, complicated further by the wide range of health information literacy among patients. Inaccurate or misleading health information can lead patients to make detrimental or even dangerous health decisions. These benefits and concerns similarly apply to conversational assistants like Siri (Apple), Alexa (Amazon), and Google Assistant, which are increasingly being used by patients and consumers to access medical- and health-related information. As these technologies are voice-activated, they appear to address some health literacy limitations. However, they still pose important limitations and safety risks,4 especially as conversational assistants are being perceived as a trustworthy parallel to clinical assessment and counseling systems.5

There has been little systematic research to explore potential risks of these platforms, as well as systematically characterize error types and error rates. This study aimed to determine the capabilities of widely used, general-purpose conversational assistants in responding to a broad range of medical questions when asked by laypersons in their own words and sought to conduct a systematic evaluation of the potential harm that could result from patients or consumers acting on the resulting recommendations. The study authors found that when asked questions about situations that require medical expertise, conversational assistants failed more than half of the time and led study participants to report that they would take actions that could have resulted in harm or death. Further, the authors characterized several failure modes, including errors due to misrecognition of study participant queries, study participant misunderstanding of tasks and responses by the conversation assistant, and limited understanding of the capabilities of the assistants to understand user queries. This misalignment of expectations by users that assistants can follow conversations/discourse led to frustrating experiences by some study participants.

 

 

Not only do these findings make important contributions to the literature of health information–seeking behaviors and limitations via conversational assistants, the study design highlights relevant approaches to evaluating interactions between users and conversational assistants and other voice-activated platforms. The authors designed a range of everyday task scenarios that real-life users may be experiencing and that can lead to querying home or smartphone devices to seek health- or medical-related information. These scenarios were also written with a level of real-life complexity that incorporated multiple facts to be considered for a successful resolution and the potential of harmful consequences should the correct course of action not be taken. In addition, they allowed study participants to interpret these task scenarios and query the conversational assistants in their own words, which further aligned with how users would typically interact with their devices.

However, this study also had some limitations, which the authors highlighted. Eligibility was limited to only English-speakers and the study sample was skewed towards younger, more educated individuals with high health literacy. Combined with the small convenience sample used, findings may not be generalizable to other/broader populations and further studies are needed, especially to highlight potential differences in population subgroups (eg, race/ethnicity, age, health literacy).

Applications for Clinical Practice

Because of the increased prevalence of online health-information–seeking behaviors by patients, clinicians must be prepared to adequately address, and in some cases, educate patients on the accuracy or relevance of medical/health information they find. Conversational assistants pose an important risk in health care as they incorporate natural language interfaces that can simulate and be misinterpreted as counseling systems by patients. As the authors highlight, laypersons cannot know what the full, detailed capabilities of conversational assistants are, either concerning their medical expertise or the aspects of natural language dialogue the conversational assistants can handle. Therefore, it is critical that clinicians and other providers emphasize the limitations of these technologies to patients and that any medical recommendations should be confirmed with health care professionals before they are acted on.

Katrina F. Mateo, MPH

Study Overview

Objective. To determine the prevalence and nature of the harm that could result from patients or consumers using conversational assistants for medical information.

Design. Observational study.

Settings and participants. Participants were recruited from an online job posting site and were eligible if they were aged ≥ 21 years and were native speakers of English. There were no other eligibility requirements. Participants contacted a research assistant by phone or email, and eligibility was confirmed before scheduling the study visit and again after arrival. However, data from 4 participants was excluded after the participants disclosed that they were not native English speakers at the end of their study sessions. Participants were compensated for their time.

Each participant took part in a single 60-minute usability session. Following informed consent and administration of baseline questionnaires, each was assigned a random selection of 2 medication tasks and 1 emergency task (provided as written scenarios) to perform with each conversational assistant—Siri, Alexa, and Google Assistant—with the order of assistants and tasks counterbalanced. Before the participants completed their first task with each conversational assistant, the research assistant demonstrated how to activate the conversational assistant using a standard weather-related question, after which the participant was asked to think of a health-related question and given 5 minutes to practice interacting with the conversational assistant with their question. Participants were then asked to complete the 3 tasks in sequence, querying the conversational assistant in their own words. Tasks were considered completed either when participants stated that they had found an answer to the question or when 5 minutes had elapsed. At task completion, the research assistant asked the participant what they would do next given the information obtained during the interaction with the conversational assistant. After the participant completed the third task with a given conversational assistant, the research assistant administered the satisfaction questionnaire. After a participant finished interacting with all 3 conversational assistants, they were interviewed about their experience.

Measures and analysis. Interactions with conversational assistants were video recorded, with the audio transcribed for analysis. Since each task typically took multiple attempts before resolution or the participant gave up, usability metrics were coded at both the task and attempt level, including time, outcomes, and error analysis. Participant-reported actions for each medical task were rated for patient harm by 2 judges (an internist and a pharmacist) using a scale adapted from those used by the Agency for Healthcare Research and Quality and the US Food and Drug Administration. Scoring was based on the following values: 0 for no harm; 1 for mild harm, resulting in bodily or psychological injury; 2 for moderate harm, resulting in bodily or psychological injury adversely affecting the functional ability or quality of life; 3 for severe harm, resulting in bodily or psychological injury, including pain or disfigurement, that interferes substantially with functional ability or quality of life; and 4 was awarded in the event of death. The 2 judges first assigned ratings independently, then met to reach consensus on cases where they disagreed. Every harmful outcome was then analyzed to determine the type of error and cause of the outcome (user error, system error, or both). The satisfaction questionnaire included 6 self-report items with response values on a 7-point scale ranging from “Not at all” to “Very satisfied.”

Main results. 54 participants completed the study, with a mean age of 42 years (SD 18) and a higher representation of individuals in the 21- to 24-year-old category than the general US adult population (30% compared to 14%). Twenty-nine (54%) were female, 31 (57%) Caucasian, and 26 (50%) college educated. Most (52 [96%]) had high levels of health literacy. Only 8 (15%) reported using a conversational assistant regularly, while 22 (41%) had never used one, and 24 (44%) had tried one “a few times.” Forty-four (82%) used computers regularly.

Of the 168 tasks completed with reported actions, 49 (29.2%) could have resulted in some degree of harm, including 27 (16.1%) that could have resulted in death. An analysis of 44 cases that potentially resulted in harm yielded several recurring error scenarios, with blame attributed solely to the conversational assistant in 13 (30%) cases, to the user in 20 (46%) cases, and to both the user and the conversational assistant in the remaining 11 (25%) cases. The most common harm scenario (9 cases, (21%) is one where the participant fails to provide all the information in the task description, and the conversational assistant responds correctly to the partial query, which the user then accepts as the recommended action to take. The next most common type of harm scenario occurs when the participant provides a complete and correct utterance describing the problem and the conversational assistant responds with partial information (7 cases, 16%). Overall self-reported satisfaction with conversational assistants was neutral, with a median rating of 4 (IQR 1-6).

 

 

Outcomes by conversational assistant were significantly different (X24 = 132.2, P < 0.001). Alexa failed for most tasks (125/394 [91.9%]), resulting in significantly more attempts made but significantly fewer instances in which responses could lead to harm. Siri had the highest task completion rate (365 [77.6%]), in part because it typically displayed a list of web pages in its response that provided at least some information to the participant. However, because of this, it had the highest likelihood of causing harm for the tasks tested (27 [20.9%]). Median user satisfaction with the 3 conversational assistants was neutral, but with significant differences among them. Participants were least satisfied with Alexa and most satisfied with Siri, and stated they were most likely to follow the recommendations provided by Siri.

Qualitatively, most participants said they would use conversational assistants for medical information, but many felt they were not quite up to the task yet. When asked about their trust in the results provided by the conversational assistants, participants said they trusted Siri the most because it provided links to multiple websites in response to their queries, allowing them to choose the response that most closely matched their assumptions. They also appreciated that Siri provided a display of its speech recognition results, giving them more confidence in its responses, and allowing them to modify their query if needed. Many participants expressed frustration with the systems, but particularly Alexa.

Conclusion. Reliance on conversational assistants for actionable medical information represents a safety risk for patients and consumers. Patients should be cautioned to not use these technologies for answers to medical questions they intend to act on without further consultation from a health care provider.

 

Commentary

Roughly 9 in 10 American adults use the Internet,1 with the ability to easily access information through a variety of devices including smartphones, tablets, and laptop computers. This ease of access to information has played an important role in shifting how individuals access health information and interact with their health care provider.2,3 Online health information can increase patients’ knowledge of, competence with, and engagement in health care decision-making strategies. Online health information seeking can also complement and be used in synergy with provider-patient interactions. However, online health information is difficult to regulate, complicated further by the wide range of health information literacy among patients. Inaccurate or misleading health information can lead patients to make detrimental or even dangerous health decisions. These benefits and concerns similarly apply to conversational assistants like Siri (Apple), Alexa (Amazon), and Google Assistant, which are increasingly being used by patients and consumers to access medical- and health-related information. As these technologies are voice-activated, they appear to address some health literacy limitations. However, they still pose important limitations and safety risks,4 especially as conversational assistants are being perceived as a trustworthy parallel to clinical assessment and counseling systems.5

There has been little systematic research to explore potential risks of these platforms, as well as systematically characterize error types and error rates. This study aimed to determine the capabilities of widely used, general-purpose conversational assistants in responding to a broad range of medical questions when asked by laypersons in their own words and sought to conduct a systematic evaluation of the potential harm that could result from patients or consumers acting on the resulting recommendations. The study authors found that when asked questions about situations that require medical expertise, conversational assistants failed more than half of the time and led study participants to report that they would take actions that could have resulted in harm or death. Further, the authors characterized several failure modes, including errors due to misrecognition of study participant queries, study participant misunderstanding of tasks and responses by the conversation assistant, and limited understanding of the capabilities of the assistants to understand user queries. This misalignment of expectations by users that assistants can follow conversations/discourse led to frustrating experiences by some study participants.

 

 

Not only do these findings make important contributions to the literature of health information–seeking behaviors and limitations via conversational assistants, the study design highlights relevant approaches to evaluating interactions between users and conversational assistants and other voice-activated platforms. The authors designed a range of everyday task scenarios that real-life users may be experiencing and that can lead to querying home or smartphone devices to seek health- or medical-related information. These scenarios were also written with a level of real-life complexity that incorporated multiple facts to be considered for a successful resolution and the potential of harmful consequences should the correct course of action not be taken. In addition, they allowed study participants to interpret these task scenarios and query the conversational assistants in their own words, which further aligned with how users would typically interact with their devices.

However, this study also had some limitations, which the authors highlighted. Eligibility was limited to only English-speakers and the study sample was skewed towards younger, more educated individuals with high health literacy. Combined with the small convenience sample used, findings may not be generalizable to other/broader populations and further studies are needed, especially to highlight potential differences in population subgroups (eg, race/ethnicity, age, health literacy).

Applications for Clinical Practice

Because of the increased prevalence of online health-information–seeking behaviors by patients, clinicians must be prepared to adequately address, and in some cases, educate patients on the accuracy or relevance of medical/health information they find. Conversational assistants pose an important risk in health care as they incorporate natural language interfaces that can simulate and be misinterpreted as counseling systems by patients. As the authors highlight, laypersons cannot know what the full, detailed capabilities of conversational assistants are, either concerning their medical expertise or the aspects of natural language dialogue the conversational assistants can handle. Therefore, it is critical that clinicians and other providers emphasize the limitations of these technologies to patients and that any medical recommendations should be confirmed with health care professionals before they are acted on.

Katrina F. Mateo, MPH

References

1. Pew Research Center. Demographics of Internet and Home Broadband Usage in the United States [online]. Accessed at: http://www.pewinternet.org/fact-sheet/internet-broadband/.

2. Tonsaker T, Bartlett G, Trpkov C. Health information on the Internet: gold mine or minefield? Can Fam Physician. 2014;60:407-408.

3. Tan SS-L, Goonawardene N. Internet health information seeking and the patient-physician relationship: a systematic review. J Med Internet Res. 2017;19:e9.

4. Chung H, Iorga M, Voas J, Lee S. Alexa, can I trust you? Computer (Long Beach Calif). 2017;50:100-104.

5. Miner AS, Milstein A, Hancock JT. Talking to machines about personal mental health problems. JAMA. 2017;318:1217.

References

1. Pew Research Center. Demographics of Internet and Home Broadband Usage in the United States [online]. Accessed at: http://www.pewinternet.org/fact-sheet/internet-broadband/.

2. Tonsaker T, Bartlett G, Trpkov C. Health information on the Internet: gold mine or minefield? Can Fam Physician. 2014;60:407-408.

3. Tan SS-L, Goonawardene N. Internet health information seeking and the patient-physician relationship: a systematic review. J Med Internet Res. 2017;19:e9.

4. Chung H, Iorga M, Voas J, Lee S. Alexa, can I trust you? Computer (Long Beach Calif). 2017;50:100-104.

5. Miner AS, Milstein A, Hancock JT. Talking to machines about personal mental health problems. JAMA. 2017;318:1217.

Issue
Journal of Clinical Outcomes Management - 26(1)
Issue
Journal of Clinical Outcomes Management - 26(1)
Page Number
11-13
Page Number
11-13
Publications
Publications
Topics
Article Type
Display Headline
Can the Use of Siri, Alexa, and Google Assistant for Medical Information Result in Patient Harm?
Display Headline
Can the Use of Siri, Alexa, and Google Assistant for Medical Information Result in Patient Harm?
Sections
Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Article PDF Media