Quantifying Iterations | Case Study

Usability & Satisfaction


Naomi Hochberg

3 years ago | 6 min read

Think about the last redesign you were a part of. How could you tell if one iteration was definitively better than the one before? Were your users more satisfied if you moved a button 16 pixels to the right and changed button text from “edit” to “move?” How did you find the balance between usability and satisfaction? Did that balance make the experience more delightful for the user?

Designing for delight is about establishing an emotional connection between the user and our product. We are used to qualitatively interviewing users to evaluate usability and infer delight. But, can we quantify human emotions and design reactions to know for sure?

Moreover, can qualitative data be quantified? And does that data tell you enough?

Usability & Satisfaction

I was involved in a redesign to introduce an organizational hierarchy in Pluralsight’s Admin tools so that users could see analytics that would match the organization of their company.

Instead of seeing data on individual learners, or small teams of those learners, Admins could continue to see data for these small groups, but they could also see data on a larger scale to track usage and skill improvement.

For this redesign, I wanted to quantify both usability and satisfaction. Would delight emerge from this balance?

After attending Front in 2019 in Salt Lake City, UT, I decided to adopt a scorecard approach, which was presented by Lauren Treasure from Chatbooks, as a way to track progress from one iteration to the next.

Instructions given by Lauren Treasure of Chatbooks.
Instructions given by Lauren Treasure of Chatbooks.

I tested both tasks and components of the design on a scale from 0 to 2.

For usability, 0 was a total failure to perform the task or use the component correctly, and a 2 was a total pass. For satisfaction, a 0 represented a negative reaction and a 2 was a positive reaction. All scores were solely based on observations.

To avoid designer-bias, I always had at least one other person scoring each interview with me. Assigning scores led to a holistic discussion about user behavior, which led to a synthesis of the needs of the user participating.

These scores and discussions helped drive the next iteration.

Scores of each user were tallied and then calculated to a percentage. My goal for both usability and satisfaction was to reach an 80% confidence to ship, a Pluralsight framework.

With that confidence, I would be ready to hand over designs to engineers and feel confident that this redesign would benefit the account Admin.

Scorecard used for every interview. This is from Dana’s interview in Iteration One (below).
Scorecard used for every interview. This is from Dana’s interview in Iteration One (below).

Overall score (%) = total points / potential points

Iteration one

For the first iteration, we interviewed 5 Admins. They consisted of two tech leaders, like software engineers, two Learning & Development (L&D) leaders, and one Pluralsight Customer Success Manager. These Admins represented accounts ranging from 15 up to 3,000 licenses.


First, we spoke with a software engineer, Tim, who managed 24 licenses. He understood and correctly anticipated affordance actions and behaviors. Some naming conventions made him hesitate slightly before correctly performing the task. Overall, Tim scored 86% in usability and 95% in satisfaction.

We felt great! Such high scores on the first iteration? Easiest redesign ever, right?

Not quite.

Going into these interviews we thought our users were tech leaders, like Tim. We assumed they were tech-savvy and familiar with common icons, like ellipses, to know where to find more actions.

What we learned was that only about half of our users were tech leaders like Tim. The other half were L&D leaders. These L&D leaders were big promoters of using Pluralsight, however, they were much more accustomed to a hand-held experience. They wanted to see everything on one page.

They didn’t want much, if any, navigation. They weren’t really interested in new affordances to perform jobs to be done.


Dana was an L&D leader focused on IT learning development. Her initial reaction was hesitation because she was used to a different design and it was a hurdle for her to assimilate to a new environment and layout.

There was a lot of hesitation before clicking anywhere. Dana scored 75% in usability and 68% in satisfaction.

This is how most of the interviews went in the first iteration — alot of hesitation.

We saw large gaps between the usability and satisfaction scores, where users were eventually able to figure out how to use the product and navigate through it, but they weren’t very happy about it.

This redesign got the job done, but not how userswanted it to.

Overall, the first iteration scored 79% in usability and 75% in satisfaction.

Not quite that 80% confidence.

Iteration two

What worked and what didn’t work in the first iteration?

I averaged the scores for each task and component across the first five users to see what passed and what didn’t.

We had two total fails: a visualization tool to see your organization’s hierarchy in a tree structure and the pin icon in the tree to indicate what level you’re on.

But other tasks and components failed too. So I needed to determine which were the most critical for the user. I created a table determining task criticality, its impact on the user, the frequency we were seeing it across all interviews, which calculated the severity (criticality x impact x frequency).

Usability issue identification table using severity calculation for the first prototype iteration
Usability issue identification table using severity calculation for the first prototype iteration

I learned that adding a level of hierarchy to the team structure was actually hindering the job to be done. How could an Admin utilize hierarchy if they couldn’t even figure out how to add these levels of hierarchy?

Once iteration two was ready, I started testing again. I talked to three tech leaders and two HR professionals. Their license counts ranged from 16 to 750.


Jordan was a tech leader managing 542 licenses. He scored 85% in both usability and satisfaction. He scored high on all the changes we made.

We fixed everything! Right?


To be fair, we made the overall 80% confidence. Across all users, this new design scored 83% in usability and 84% in satisfaction.

Users were happier with the tasks they were performing. But we realized that while we were testing usability and satisfaction, we weren’t addressing what was most important to the user: value.

It’s great if a user can perform the tasks we ask, but what does that matter if they don’t understand the value these new designs will bring to their day-to-day use of the product?

They didn’t understand why this redesign would be important for them.

Iteration three

We needed to show that if an Admin has a multi-level organizational hierarchy, that translates into more flexibility in reporting analytics, then they will realize a higher ROI.

This ROI is different for everyone, but we wanted to show that a useful and usable organizational hierarchy would lead to deeper insights into user engagement.

We interviewed 7 more Admins using a new iteration with changes to show why organizational hierarchy would be valuable. We focused on asking the users why that task was usable or satisfactory. What value did they see in these new features? Why did they see that value?

We talked to one HR professional, three tech leaders and three L&D managers with plans ranging from 90 to 2,300 licenses.

Overall, usability stayed about the same at 84%. But once users began to see on their own that this product would help translate to their ROI, satisfaction soared to 87%.

Ultimately, they started to see how this redesign would help communicate their insights to their stakeholders.

Trending scores from iteration to iteration
Trending scores from iteration to iteration

Key takeaway

You can quantify qualitative data. By looking at your qualitative interviews differently, rather than just synthesizing feedback and comments, you can put a numeric value on each component and iterate based on those results.

It is important to note, however, that these numbers will only go so far.

And that’s because you can’t quantify value. Or at least measuring this perfectly is impossible.

Yes, satisfaction will probably go up when the value is realized, but there’s no way to measure if an individual is going to recognize that value on their own.

These scorecards help understand user actions and satisfaction, or lack thereof. But, value is owned by the user and it will vary based on the situations they are in.

What I learned is that iterating based on quantifying my research is possible! Value will always be subjective, but if I can do everything I can in my designs to illuminate that value, the user will see it.


Names and identifying details have been changed to protect the privacy of our users.

For more context on the design process for Org 2.0, please read my case study here.

Imitation is the sincerest form of flattery. Here are the sources that helped guide my scoring:


Created by

Naomi Hochberg







Related Articles