Roche Blood Glucose Meter

A Summative Usability Evaluation

Project Goal

Demonstrate that this blood glucose meter could be used safely by the intended user group, and to note any design or usability flaws of the device.

Project Type:

Timeline:

My Role:

Summative Usability Evaluation

September 2021

Lead Researcher

Project Type: Summative Usability Evaluation

Timeline: September 2021

My Role: Lead Researcher

Introduction

Roche is a multinational healthcare organization based in Europe who sells medical devices used both in hospitals and home settings all over the world. One such device is a blood glucose meter, which helps diabetics measure their blood sugar and track insulin intake. We were contracted by Roche to perform a summative usability test on a newly developed blood glucose meter as part of their FDA submission process.

Our goal with this summative usability test was to demonstrate that this blood glucose meter could be used safely by the intended user group, and to note any design or usability flaws of the device. To do so, we conducted 15 test sessions with participants from each user group and created a set of task scenarios for participants to complete, which studied all major identified risks with the blood glucose meter.

My role during this project was that of lead researcher and moderator, and I was assisted by my colleague Ruiqi Li, who acted as a notetaker during this study.

*A note about Summative testing: Summative usability testing for an FDA submission is structured differently than other forms of user testing. It is a required part of a company’s submission to the FDA to have a medical device be sold in the U.S, and is often one of the last steps before a company makes this submission. In summative testing, while we are noting user behaviors and trying to understand them like with any user testing, the primary goal is to show that the product can be safely and effectively used as is, and less emphasis is placed on how to change or improve the product. Additionally, when moderating a summative test, it is important to adhere to the same protocol and word choice for each session, to keep each individual session standardized.*

Conducting the test

Test structure

For this study, we tested 15 Nurses who worked in hospital environments, and 15 Point of Care Coordinators who were providing special care to diabetic patients in clinics or similar environments. Sessions were 90 minutes long, and consisted of a “free exploration” period, an evaluation, and a debrief.

During the evaluation portion, either the moderator or the notetaker acted as a mock patient as participants navigated 6 test scenarios. Each test scenario had 10-15 subgoals that the participants were scored on. Throughout the evaluation, neither the moderator nor the notetaker was able to give any additional assistance to the participant outside of the test scenario and scripted intervention.

Our testing facility, set up to emulate a hospital room

To keep each test session consistent and running smoothly, we used a moderation guide which contained information such as:

Scripts for introduction and to introduce each task scenario
A checklist of preparations to be made before each test scenario
A subgoal breakdown with a description of pass/fail conditions for each subgoal
Scripted interventions and anticipated participant issues

Scoring the test

Participants were scored on how they completed each subgoal for a test scenario on a 4 point scale. We defined each of these scores as follows:

Success

The participant completed the task without any use errors, and were able to understand the product’s display, controls, and instructions without difficulty

Success

The participant completed the task without any use errors, and were able to understand the product’s display, controls, and instructions without difficulty

Success

The participant completed the task without any use errors, and were able to understand the product’s display, controls, and instructions without difficulty

Success

The participant completed the task without any use errors, and were able to understand the product’s display, controls, and instructions without difficulty

Use Difficulty

A brief hesitation, struggle, or confusion that the participant worked through to complete the task. Indicates when a subgoal was completed without observed use errors, but inefficiently.

Use Difficulty

A brief hesitation, struggle, or confusion that the participant worked through to complete the task. Indicates when a subgoal was completed without observed use errors, but inefficiently.

Use Difficulty

A brief hesitation, struggle, or confusion that the participant worked through to complete the task. Indicates when a subgoal was completed without observed use errors, but inefficiently.

Use Difficulty

A brief hesitation, struggle, or confusion that the participant worked through to complete the task. Indicates when a subgoal was completed without observed use errors, but inefficiently.

Close Call

The participant performed a use error, which they later corrected in the same test scenario to perform the task successfully. The way they recovered from the error would not pose a risk to the patient or user in a real-world scenario.

Close Call

Use Error

An action or lack of action that leads to a different result than intended or expected. This includes the inability of the participant to complete a task.

Use Error

An action or lack of action that leads to a different result than intended or expected. This includes the inability of the participant to complete a task.

Use Error

An action or lack of action that leads to a different result than intended or expected. This includes the inability of the participant to complete a task.

Use Error

An action or lack of action that leads to a different result than intended or expected. This includes the inability of the participant to complete a task.

When a Close Call or a Use Error was observed during testing, the moderator would probe the participant on what they were thinking during the task, discuss observed close calls or use errors, and find out why they took the actions (or the inaction) that were observed. The moderator and note-taker rated all tasks during the sessions and compared their ratings afterward, reviewing their observation notes or video recordings if needed.

Analyzing the Data

Explaining root causes

For every Close Call and Use Error we observed, we sought to explain why these ratings occurred through a root cause analysis. In this analysis, we would describe the circumstances of the Close Call and Use Error, and then suggest any factors which may have contributed to these ratings. We referred to both observed behaviors by the moderator and notetaker, and to explanations about their thought process obtained directly from the participant during the study.

This analysis was vital to understanding the effectiveness and shortcomings of the product we tested, as it helps identify where there are and are not issues indicative of a flaw in the product’s design and usability, and to pinpoint where there is legitimate risk to the patient and user.

*A note about identified risks: With this project, and frequently with summative testing, our client has usability engineers compile all possible safety risks associated with this product. We then use this risk analysis to create our moderation guide and task scenarios*

This action was a use error because the participant didn't permit one of the required task subgoals. However, based on this analysis, it's clear it wasn't a usability problem that contributed to the error.

Without providing this analysis, these Use Errors would have equal weight as use errors that occurred due to the device's design.

This action is also a use error, but here we get an explanation of how the product may have contributed to the error, and to what extent.

System Usability Scale

Standard system usability scale questions, administered during the debrief portion of the study through a google form

During the debrief portion of the study, participants completed a system usability scale consisting of 10 questions, with all questions formatted as a likert scale with options from 1 (strongly disagree) to 5 (strongly agree).

By having participants complete this system usability scale, we were able to obtain an objective measure of the device's usability, and to obtain a quantifiable metric we could use for comparison's sake, both for similar products and completed unrelated medical devices.

With a system usability scale of 83 (based on the average responses from participants), this blood glucose moniter graded out well above the usability benchmark of 68 and objectively marks this product as effective, efficient, satisfactory and usable.

Results

Through our analysis, we were able to determine that the blood glucose moniter we tested was safe to use, effective device which medical professionals can be expected to use without error in most cases. There was no pattern of results that was indicative of systematic design flaws, and participants in both user groups (nurses and point of care coordinators) were generally successful around 95% of the time at completing the tasks in the test scenarios presented to them.

While the device is still awaiting approval from the FDA to be sold in the US, it is currently available on the European marketplace.

Due to a non-disclose agreement with Roche, I am limited in the amount of analysis and findings I can disclose, as it contains proprietary information. If you'd like to hear more, please contact me at Nick.weinel@gmail.com

‹ Evaluating a platform of election resources for the

› HCC Camp Website