CS295J/Research proposal (draft 2)
Introduction
- Owners: Adam Darlow, Eric Sodomka
We propose a framework for interface evaluation and recommendation that integrates behavioral models and design guidelines from both cognitive science and HCI. Our framework behaves like a committee of specialized experts, where each expert provides its own assessment of the interface, given its particular knowledge of HCI or cognitive science. For example, an expert may provide an evaluation based on the GOMS method, Fitts's law, Maeda's design principles, or cognitive models of learning and memory. An aggregator collects all of these assessments and weights the opinions of each expert, and outputs to the developer a merged evaluation score and a weighted set of recommendations.
Systematic methods of estimating human performance with computer interfaces are used only sparsely despite their obvious benefits, the reason being the overhead involved in implementing them. In order to test an interface, both manual coding systems like the GOMS variations and user simulations like those based on ACT-R/PM and EPIC require detailed pseudo-code descriptions of the user workflow with the application interface. Any change to the interface then requires extensive changes to the pseudo-code, a major problem because of the trial-and-error nature of interface design. Updating the models themselves is even more complicated. Even an expert in CPM-GOMS, for example, can't necessarily adapt it to take into account results from new cognitive research.
Our proposal makes automatic interface evaluation easier to use in several ways. First of all, we propose to divide the input to the system into three separate parts, functionality, user traces and interface. By separating the functionality from the interface, even radical interface changes will require updating only that part of the input. The user traces are also defined over the functionality so that they too translate across different interfaces. Second, the parallel modular architecture allows for a lower "entry cost" for using the tool. The system includes a broad array of evaluation modules some of which are very simple and some more complex. The simpler modules use only a subset of the input that a system like GOMS or ACT-R would require. This means that while more input will still lead to better output, interface designers can get minimal evaluations with only minimal information. For example, a visual search module may not require any functionality or user traces in order to determine whether all interface elements are distinct enough to be easy to find. Finally, a parallel modular architecture is much easier to augment with relevant cognitive and design evaluations.
Background / Related Work
Each person should add the background related to their specific aims.
- Steven Ellis - Cognitive models of HCI, including GOMS variations and ACT-R
- EJ - Design Guidelines
- Jon - Perception and Action
- Andrew - Multiple task environments
- Gideon - Cognition and dual systems
- Ian - Interface design process
- Trevor - User trace collection methods (especially any eye-tracking, EEG, ... you want to suggest using)
Specific Aims and Contributions (to be separated later)
See the flowchart for a visual overview of our aims.
In order to use this framework, a designer will have to provide:
- Functional specification - what are the possible interactions between the user and the application. This can be thought of as method signatures, with a name (e.g., setVolume), direction (to user or from user) and a list of value types (boolean, number, text, video, ...) for each interaction.
- GUI specification - a mapping of interactions to interface elements (e.g., setVolume is mapped to the grey knob in the bottom left corner with clockwise turning increasing the input number).
- Functional user traces - sequences of representative ways in which the application is used. Instead of writing them, the designer could have users use the application with a trial interface and then use our methods to generalize the user traces beyond the specific interface (The second method is depicted in the diagram). As a form of pre-processing, the system also generates an interaction transition matrix which lists the probability of each type of interaction given the previous interaction.
- Utility function - this is a weighting of various performance metrics (time, cognitive load, fatigue, etc.), where the weighting expresses the importance of a particular dimension to the user. For example, a user at NASA probably cares more about interface accuracy than speed. By passing this information to our committee of experts, we can create interfaces that are tuned to maximize the utility of a particular user type.
Each of the modules can use all of this information or a subset of it. Our approach stresses flexibility and the ability to give more meaningful feedback the more information is provided. After processing the information sent by the system of experts, the aggregator will output:
- An evaluation of the interface. Evaluations are expressed both in terms of the utility function components (i.e. time, fatigue, cognitive load, etc.), and in terms of the overall utility for this interface (as defined by the utility function). These evaluations are given in the form of an efficiency curve, where the utility received on each dimension can change as the user becomes more accustomed to the interface.
- Suggested improvements for the GUI are also output. These suggestions are meant to optimize the utility function that was input to the system. If a user values accuracy over time, interface suggestions will be made accordingly.
Collecting User Traces
- Owner: Trevor O'Brien
Given an interface, our first step is to run users on the interface and log these user interactions. We want to log actions at a sufficiently low level so that a GOMS model can be generated from the data. When possible, we'd also like to log data using additional sensing technologies, such as pupil-tracking, muscle-activity monitoring and auditory recognition; this information will help to analyze the explicit contributions of perception, cognition and motor skills with respect to user performance.
Generalizing User Traces
- Owner: Trevor O'Brien
The user traces that are collected are tied to a specific interface. In order to use them with different interfaces to the same application, they should be generalized to be based only on the functional description of the application and the user's goal hierarchy. This would abstract away from actions like accessing a menu.
In addition to specific user traces, many modules could use a transition probability matrix based on interaction predictions.
Parallel Framework for Evaluation Modules
- Owner: Adam Darlow, Eric Sodomka
This section will describe in more detail the inputs, outputs and architecture that were presented in the introduction.
Evaluation and Recommendation via Modules
- Owner: E J Kalafarski
This section describes the aggregator, which takes the output of multiple independent modules and aggregates the results to provide (1) an evaluation and (2) recommendations for the user interface. We should explain how the aggregator weights the output of different modules (this could be based on historical performance of each module, or perhaps based on E.J.'s cognitive/HCI guidelines).
Sample Modules
CPM-GOMS
- Owners: Steven Ellis
This module will provide interface evaluations and suggestions based on a CPM-GOMS model of cognition for the given interface. It will provide a quantitative, predictive, cognition-based parameterization of usability. From empirically collected data, user trajectories through the model (critical paths) will be examined, highlighting bottlenecks within the interface, and offering suggested alterations to the interface to induce more optimal user trajectories.
HCI Guidelines
- Owner: E J Kalafarski
This section could include an example or two of established design guidelines that could easily be implemented as modules.
Fitts's Law
- Owner: Jon Ericson
This module ultimately output an estimate of the required time to complete various tasks that have been decomposed into formalized sequences of interactions with interface elements, and will provide evaluations and recommendations for optimizing the time required to complete those tasks using the interface.
Inputs
1. A formal description of the interface and its elements (e.g. buttons).
2. A formal description of a particular task and the possible paths through a subset of interface elements that permit the user to accomplish that task.
3. The physical distances between interface elements along those paths.
4. The width of those elements along the most likely axes of motion.
5. Device (e.g. mouse) characteristics including start/stop time and the inherent speed limitations of the device.
Output
The module will then use the Shannon formulation of Fitt's Law to compute the average time needed to complete the task along those paths.
Affordances
- Owner: Jon Ericson
This simple module will provide interface evaluations and recommendations based on a measure of the extent to which the user perceives the relevant affordances of the interface when performing a number of specified tasks.
Inputs
Formalized descriptions of...
1. Interface elements
2. Their associated actions
3. The functions of those actions
4. A particular task
5. User traces for that task.
Inputs (1-4) are then used to generate a "user-independent" space of possible functions that the interface is capable of performing with respect to a given task -- what the interface "affords" the user. From this set of possible interactions, our model will then determine the subset of optimal paths for performing a particular task. The user trace (5) is then used to determine what functions actually were performed in the course of a given task of interest and this information is then compared to the optimal path data to determine the extent to which affordances of the interface are present but not perceived.
Output
The output of this module is a simple ratio of (affordances perceived) / [(relevant affordances present) * (time to complete task)] which provides a quantitative measure of the extent to which the interface is "natural" to use for a particular task.
Interruptions
- Owner: Andrew Bragdon
While most usability testing focuses on low-level task performance, there is also previous work suggesting that users also work at a higher, working sphere level. This module attempts to evaluate a given interface with respect to these higher-level considerations, such as task switching.
Working Memory Load
- Owner: Gideon Goldin
This module measures how much information the user needs to retain in memory while interacting with the interface and makes suggestions for improvements.
Automaticity of Interaction
- Owner: Gideon Goldin
Measures how easily the interaction with the interface becomes automatic with experience and makes suggestions for improvements.
Integration into the Design Process
- Owner: Ian Spector
This section outlines the process of designing an HCI interface and at what stages our proposal fits in and how.
Preliminary Results
Each person should come up with a single paragraph describing fictional (or not) preliminary results pertaining to their owned specific aims and contributions.
[Criticisms]
- Owner: Andrew Bragdon
Any criticisms or questions we have regarding the proposal can go here.