Prioritizing Audio Needs
for Smart Glasses

The team needed a principled basis for prioritizing which audio experiences to design for across AR and VR. I built a survey (n=1,400) that turned qualitative findings from 2 prior diary studies into a ranked list of outcomes.

Company
Meta
Role
UX Researcher
Stakeholders
Research Manager, Research Director, PM
Year
2023

I led this survey (n=1,400) end-to-end. I reanalyzed diary study data into survey-testable content, defined a mixed-design survey, and generated multiple deliverables, including a scoring system and dashboard that informed product roadmaps.

  • I had analyzed 2 diary studies investigating real-world contexts where people want to control sounds around them. After watching hundreds of videos, I arrived at a taxonomy to label entries. The result: a set of context variables that captured range and frequency counts. But I wasn't done. Insights created desire for prioritization signal.

    The challenge was compressing the data into a limited number of scenarios to test at scale. I was inspired by the product team's needs-based framework, but as my research was product agnostic, I defined a new set of needs grounded in collected data. I drew more inspiration from the typical structure of user stories, defining a new structure accounting for context variables. My vision from the start was to convert qualitative insights to survey-testable items. The result: 18 outcome statements grounded in ecologically valid data.

  • The right survey design was not obvious. Stakeholders floated ideas like collecting ratings for multi-line scenario descriptions, but that posed a scale-use bias risk. A ranking task was also considered, but that carried a reliability risk with 8+ items.

    In search of ideas, I attended another team's internal presentation on how to combine a sequential monadic with a MaxDiff approach. I adapted the approach to create a survey instrument that forced respondents to make needs-based tradeoffs while also diving deeper with ratings for target metrics.

  • Crafting a survey from rich data tested my balancing skills. My moves were:

    1. Using a sequential monadic component, but this expanded survey length considerably (4 metrics x 18 outcomes = 72 questions).
    2. Surface only 6 of the 18 outcomes per respondent to control length, but this created the risk of priming with whichever 6 outcomes a respondent was exposed to.
    3. Move the sequential monadic component after the MaxDiff task to eliminate the priming risk.

    At every turn, my aim was to capture quality data without compromising the survey experience.

  • The MaxDiff task would tell us which outcomes were preferred. But not where the biggest opportunities lived.

    I selected 4 metrics for the sequential monadic section to push data from answering "what matters" to "where is the biggest gap.":

    1. Importance: A baseline measure that carried over from diary studies.
    2. Location: Existing audio technology had been optimized for specific acoustic environments, so identifying overlap with user needs would inform product positioning and investment efforts.
    3. Satisfaction with Workarounds: Diary study findings pointed to good-enough solutions in some scenarios.
    4. Frequency: Diary study findings also suggested that some scenarios were infrequent, perhaps representing a narrow opportunity gap.
  • To get to a single list of ranked outcomes, I needed to converge 3 data sources:

    1. MaxDiff Preference Scores: Measured at the outcome level.
    2. Sequential Monadic Ratings: Also measured at the outcome level, but at 4 different angles.
    3. Diary Study Data: Measured at the need level, a coarser grain, but with experience ratings that carried quantitative signal.

    Where all 3 sources agreed, confidence was high. Where they disagreed, I flagged for closer examination. For example, a need that ranked high in the diary study but low in the survey warranted follow-up because ratings were grounded in actual experiences. I needed to balance the volume and range of data points with my analytical ceiling at the time. Visual comparisons, like bubble charts, were the honest trick that helped.

  • The data cut across metrics, sources, and scales that needed to come together to form a ranked list. I had my work cut out. The goal was a composite score per outcome. The steps were to transform inputs so I could combine them.

    For example, MaxDiff utilities had to be rescaled for parity with other metrics. Satisfaction with workarounds needed scale reversal for direction agreement across metrics. What mattered was having a systematic ranking approach paired with the richer underlying data, the fun space where stakeholders could explore the "why" behind rankings.

  • I wasn't sure who would ultimately leverage my work. As team priorities evolved, new questions were sure to flow in. The Looker Studio dashboard I built combined diary study and survey data, but making that combination functional required reshaping data from wide to long and manually joining datasets in Google Sheets so that filtering by need, location, and segment would apply across visualizations from different sources.

  • The marketing team's segmentation tool was a spreadsheet where a researcher could paste answers to questions designed to identify one of 8 segments. As a post-collection step, I couldn't leverage it mid-survey. Without it, fielding was resource intensive.

    I traced the spreadsheet's scoring logic despite not fully understanding calculations. After combing through dot products, cluster coefficients, and conditional logic, I reimplemented the tool on Qualtrics. To validate accuracy, I ran dozens of survey dry runs, ensuring my algorithm's output matched the original spreadsheet.

    The need for mid-survey segmentation came up in a later, unrelated study. To my disappointment, my implementation was not transferable due to survey type constraints. The team contracted Qualtrics for a rebuild that cost about $10,000.