CS295J/Research proposal: Difference between revisions
Steven Ellis (talk | contribs) |
→Design guidelines: intro |
||
| Line 119: | Line 119: | ||
===Design guidelines=== | ===Design guidelines=== | ||
A multitude of rule sets exist for the design of | A multitude of rule sets exist for the design of not only interface, but architecture, city planning, and software development. They can range in scale from one primary rule to as many Christopher Alexander's 253 rules for urban environments,<ref>http://hci.rwth-aachen.de/materials/publications/borchers2000a.pdf</ref> which he introduced with the concept design patterns in the 1970s. Study has likewise been conducted on the use of these rules:<ref>http://stl.cs.queensu.ca/~graham/cisc836/lectures/readings/tetzlaff-guidelines.pdf</ref> guidelines are often only partially understood, indistinct to the developer, and "fraught" with potential usability problems given a real-world situation. | ||
====Application to AUE==== | ====Application to AUE==== | ||
Revision as of 15:25, 6 February 2009
Project summary
We propose to integrate theories models of cognition, models of perception, rules of design, and concepts from the discipline of human-computer interaction to develop a predictive model of user performance in interacting with computer software for visual and analytical work. Our proposed model comprises a set of computational elements representing components of human cognition, memory, or perception. The collective abilities and limitations of these elements can be used to provide feedback on the likely efficacy of user interaction techniques.
The choice of human computational elements will be guided by several models or theories of cognition and perception, including Gestalt, Distributed, Gibson, ???(where pathway, when pathway)???, ???working-memory???, ..., and ???. The list of elements will be extensible. The framework coupling them will allow for experimental predictions of utility of user interfaces that can be verified against human performance.
Coupling the system with users will involve a data capture mechanism for collecting the communications between a user interface and a user. These will be primarily event based, and will include a new low-cost camera-based eye-tracking system
During early development, existing interfaces will be evaluated manually to characterize their
(we need some way to specify interaction techniques...)
Specific Contributions
- A model of human cognitive and perceptual abilities when using computers
- Demonstration of the model in predicting human performance with some interfaces.
- ???
- Something about design rules collected and merged
- Something comparing these collected rules to a baseline (establishing their value)
- ???
- Traditionally, software design and usability testing is focused on low-level task performance. However, prior work (Gonzales, et al.) provides strong empirical evidence that users also work at a higher, working sphere level. Our model will specifically identify and predict key aspects of higher level information work behaviors, such as task switching. We will conduct initial exploratory studies to test specific instances of this high-level hypothesis. We will then use the refined model to identify specific predictions for the outcome of a formal, ecologically valid study involving a complex, non-trivial application.
- A low-overhead mechanism for capturing event-based interactions between a user and a computer, including web-cam based eye tracking. (should we buy or find out about borrowing use of pupil tracker?) Should we include other methods of interaction here? Audio recognition seems to be the lowest cost. It would seem that a system that took into account head-tracking, audio, and fingertip or some other high-DOF input would provide a very strong foundation for a multi-modal HCI system. It may be more interesting to create a software toolkit that allows for synchronized usage of those inputs than a low-cost hardware setup for pupil-tracking. I agree pupil-tracking is useful, but developing something in-house may not be the strongest contribution we can make with our time. (Trevor)
- Accuracy study of eye tracking (2 cameras? double as an input device?)
- ???
- A set of critiques of existing software used for visual and analytical work based on design rules
- ???
- A systematic technique to determine task distribution based on psychological principles. (Owner - Gideon)
- We can build a computational classification algorithm based on the nature of the task properties (e.g., memory-intensive, perceptual, reasoning, etc...)
- Construct a simple set of guidelines for UI engineers to use
Specific Aims
- build X
- build Y
- run experiment Z
- compare X with existing approach Q
- Develop a scoring system for interfaces to evaluate the degree to which all changes and causal relations are tracked by motion cues that are contiguous in time and/or space.
- Accurately assess computational and psychological costs for tasks and subtasks. To do this, we will develop two non-trivial prototype systems; a conventional control system and a novel system which is based on our model of task switching. We will use our model to make specific predictions about relative task performance and user affect responses, and then test these predictions empirically in a formal study.
- Develop model that accounts for qualitatively different psychological tasks
- Test model on real-world data
Background
Models of cognition
There are several models of cognition, ranging from fundamental aspects of neurological processing to extremely high-level psychological analysis. Three main theories seem to have become recognized as the most helpful in conceptualizing the actual process of HCI. These models all agree that one cannot accurately analyze HCI by viewing the user without context, but the extent and nature of this context varies greatly.
Activity Theory, developed in the early 20th century by Russian psychologists S.L. Rubinstein and A.N. Leontiev, posits the existence of four discrete aspects of human-computer interaction. The "Subject" is the human interacting with the item, who possesses an "Object" (e.g. a goal) which they hope to accomplish by using a tool. The Subject conceptualizes the realization of the Object via an "Action", which may be as simple or complex as is necessary. The Action is made up of one or more "Operations", the most fundamental level of interaction including typing, clicking, etc.
A key concept in Activity Theory is that of the artifact, which mediates all interaction. The computer itself need not be the only artifact in HCI - others include all sorts of signs, algorithmic methods, instruments, etc.
A longer synopsis of Activity Theory may be found at this website.
The Situated Action Model focuses on emergent behavior, emphasizing the subjective aspect of human-computer interaction and the therefore-necessary allowance for a wide variety of users. This model proposes the least amount of contextual interaction, and seems to maintain that the interactive experience is determined entirely by the user's ability to use the system in question. While limiting, this concept of usability can be very informative when designing for less tech-savvy users.
Distributed Cognition proposes that the computer (or, as in Activity Theory, any other artifact) can be used and ought to be thought of as an extension of the mental processing of the human. This is not to say that the two are of equal or even comparable cognitive abilities, but that each has unique strengths and that recognition of and planning around these relative advantages can lead to increased efficiency and effectiveness. The rotation of blocks in Tetris serves as a perfect example of this sort of cognitive symbiosis.
(Steven)
Workflow Context
There are, at least, two levels at which users work (Gonzales, et al., 2004). Users accomplish individual low-level tasks which are part of larger working spheres; for example, an office worker might send several emails, create several Post-It (TM) note reminders, and then edit a word document, each of these smaller tasks being part of a single larger working sphere of "adding a new section to the website." Thus, it is important to understand this larger workflow context - which often involves extensive levels of multi-tasking, as well as switching between a variety of computing devices and traditional tools, such as notebooks. In this study it was found that the information workers surveyed typically switch individual tasks every 2 minutes and have many simultaneous working spheres which they switch between, on average every 12 minutes. This frenzied pace of switching tasks and switching working spheres suggests that users will not be using a single application or device for a long period of time, and that affordances to support this characteristic pattern of information work are important.
Czerwinski, et al. conducted a diary study of task switching and interruptions of users in 2004. This study showed that task complexity, task duration, length of absence, and number of interruptions all affected the users' own perceived diffculty of switching tasks. Iqbal, et al. studied task disruption and recovery in a field study, and found that users often visited several applications as a result of an alert, such as a new email notification, and that 27% of task suspensions resulted in 2 hours or more of disruption. Users in the study said that losing context was a significant problem in switching tasks, and led in part to the length of some of these disruptions. This work hints at the importance of providing cues to users to maintain and regain lost context during task switching.
(Andrew Bragdon - OWNER)
The problem of task switching is exacerbated when some tasks are more routine than others. When a person intends to switch from a routine task to a novel task at some later time, they often forget the context of the original task (Aarts et al., 1999). Also, if both tasks are done in the same context, with the same tools or with the same materials, people have difficulty inhibiting the routine task while doing the novel task (Stroop, 1935). This inhibition also makes switching back to the routine task slower (Allport et al., 1994). All of these problems can be alleviated to some degree by salient cues in the environment. The intention to switch becomes easy (huh??? "easy to identify"?) when there is a salient reminder at the appropriate time (McDaniel and Einstein, 1993) and associating different environmental cues with different goals can automatically trigger appropriate behavior (Aarts and Dijksterhuis, 2003).
(Adam)
(Edited by Andrew)
Qunatitative Models: Fitts's law, Steering Law
Fitts's law and the steering law are examples of quantiative models that predict user performance when using certain types of user interfaces. In addition to these classic models, Cao and Zhai developed and validated a quantitative model of human performance of pen stroke gestures in 2007. Lank and Saund utilized a model which used curvature to predict the speed of a pen as it moved across a surface to help disambiguate target selection intent.
In addition, quantitative models are often tested against new interfaces to verify that they hold. For example, Grossman et al. verified that their Bubble Cursor approach to enlarging effective pointing target sizes obeyed Fitts's law for actual distance traveled.
In addition to formal models, machine learning techniques have been applied to modeling user interaction as well. For example, Hurst, et al., used a learning classifier, trained on low-level mouse and keyboard usage patterns, to identify novice and expert users dynamically with accuracies as high as 91%. This classifier was then used to provide different information and feedback to the user as appropriate.
(Andrew Bragdon - OWNER)
Distributed cognition
Distributed cognition is a theory in which thoughts take place in and outside of the brain. Human's have a great ability to use tools and to incorporate their environments into their sphere of thinking. Clark puts it nicely in [Clark-1994-TEM].
Therefore, optimal configurations when considering HCI design will treat the brain, person, interface, and computer as a holistic system comprising cognitions.
In practical terms, the issue at hand for our proposal is how to best maximize utility by distributing the cognitive tasks at hand to different components of the whole system. Simply, which tasks can we off-load to the computer to do for us, faster and more accurately? What tasks should we purposely leave the computer out of?
Typically, those tasks most eligible to be off-loaded are the ones which we perform poorly on. Conveniently, the tasks which computers perform poorly on we excel at. Here are a few examples:
- Computer's Area of Expertise: number crunching, memory, logical reasoning, precise
- Human's Area of Expertise: Associative thought, real-world knowledge, social behavior, alogical reasoning, imprecise
Using this division of cognitive labor allows us to optimize task work flows. Ignoring it puts strain and bottlenecks at either the computer, the human, or the interface. The field of HCI is full of examples of failures which can be attributed to not recognizing which tasks should be handled by which sub-system.
As a heuristic to divide thinking, one might turn to the dual-process theory literature Evans-2003-ITM. What is most often called System 1 is what human's are good at, while System 2 tasks are what computers do well.
(Owner - Gideon)
Information Processing Approach to Cognition
The dominant approach in Cognitive Science is called information processing. It sees cognition as a system that takes in information from the environment, forms mental representations and manipulates those representations in order to create the information needed to achieve its goals. This approach includes three levels of analysis originally proposed by Marr (will cite):
- Computational - What are the goals of a process or representation? What are the inputs and desired outputs required of a system which performs a task? Models at this level of analysis are often considered normative models, because any agent wanting to perform the task should conform to them. Rational agent models of decision making, for example, belong at this level of analysis.
- Process/Algorithmic - What are the processes or algorithms involved in how humans perform the task? This is the most common level of analysis as it focuses on the mental representations, manipulations and computational faculties involved in actual human processing. Algorithmic descriptions of human capabilities and limitations, such as working memory size, belong at this level of analysis.
- Implementation - How are the processes and algorithms be actual biological computation? Dopamine theories of reward learning, for example, belong at this level of analysis.
The information processing approach is often contrasted with the distributed cognition approach. Its advantage is that it finds general mechanisms that are valid across many different contexts and situations. Its disadvantage is that it can have difficulty explaining the rich interactions between people and their environment. (Adam)
Models of perception
Gibsonianism
(relevant stuff about Gibson's theory) We will build on top of this theory/model by ... .
Gibsonianism, named after James J. Gibson and more commonly referred to as ecological psychology, is an epistemological direct realist theory of perception and action. In contrast to information processing and cognitivist approaches which generally assume that perception is a constructive process operating on impoverished sense-data inputs (e.g. photoreceptor activity) to generate representations of the the world with added structure and meaning (e.g. a mental or neural "picture" of a chair), ecological psychology treats perception as a relation that places animals in direct and lawful epistemic contact with behaviorally-relevant events and features of their environmental niches (Warren, 2005). These lawfully-specified and behaviorally-relevant features of the environment constitute the possibilities for action that the environment "affords" the animal (Gibson, 1986).
Gibson's notion of affordance has many implications for our enterprise, however, it is worth noting that the original definition of affordance emphasizes possibilities for action and not their relative likelihoods. For example, for most humans, laptop computer screens afford puncturing with Swiss Army knives, however, it is unlikely that a user will attempt to retrieve an electronic coupon by carving it out of their monitor. This example illustrates that interfaces often afford a class of actions that are undesirable from the perspective of both the designer and the user.
(Jon)
Design guidelines
A multitude of rule sets exist for the design of not only interface, but architecture, city planning, and software development. They can range in scale from one primary rule to as many Christopher Alexander's 253 rules for urban environments,[1] which he introduced with the concept design patterns in the 1970s. Study has likewise been conducted on the use of these rules:[2] guidelines are often only partially understood, indistinct to the developer, and "fraught" with potential usability problems given a real-world situation.
Application to AUE
And yet, the vast majority of guideline sets, including the seminal rulesets, have been arrived at heuristically. The most successful rulesets, such as Raskin's and Schneiderman's, have been arrived at from years of observation instead of empirical study and experimentation. The problem is similar to the problem of circular logic faced by automated usability evaluations: an automated system is limited, in the suggestions it can offer, to a set of preprogrammed guidelines which have often not been subjected to rigorous experimentation.[3] In the vast majority of existing studies, emphasis has traditionally been placed on the development of guidelines, or the application of existing guidelines to automated evaluation. A mutually-reinforcing development of both simultaneously has not been attempted.
Overlap between rulesets is inevitable and unavoidable. For our purposes of evaluating existing rulesets efficiently, without extracting and analyzing each rule individually, it may be desirable to identify the the overarching principles or philosophy (max. 2 or 3) for a given ruleset and determining its quantitative relevance to problems of cognition.
Popular and seminal examples
Schneiderman's Eight Golden Rules date to 1987 and are arguably the most-cited. They are more heuristic than most, but can be somewhat classified by cognitive objective: at least two rules apply primarily to use, versus discoverability. Up to five of Schneiderman's rules emphasize predictability in the outcomes of operations and increased feedback and control in the agency of the user. His final rule, paradoxically, removes control from the user by suggesting a reduced short-term memory load, which we can arguably classify as simplicity.
Raskin's Design Rules are classified into five principles by the author, augmented by definitions and supporting rules. While one principle is primarily aesthetic (a design problem arguably out of the bounds of this proposal) and one is a basic endorsement of testing, the remaining three begin to reflect philosophies similar to Schneiderman's: reliability or predictability, simplicity or efficiency (which we can construe as two sides of the same coin), and finally introduces a concept of uninterruptibility.
Maeda's Laws of Simplicity are fewer, and ostensibly emphasize simplicity exclusively, although elements of use as related by Schneiderman's rules and efficiency as defined by Raskin may be facets of this simplicity. Google's corporate mission statement presents Ten Principles, only half of which can be considered true interface guidelines. Indeed,
Elements and goals of a guideline set
Myriad rulesets exist, but variation becomes scarce—it indeed seems possible to parse these common rulesets into overarching principles that can be converted to or associated with quantifiable cognitive properties. For example, it is likely simplicity has an analogue in the short-term memory retention or visual retention of the user, vis a vis the rule of Seven, Plus or Minus Two. Predictability likewise may have an analogue in Activity Theory, in regards to a user's perceptual expectations for a given action, and so forth.
Within the scope of this proposal, we aim to reduce and refine these philosophies found in seminal rulesets and identify their logical cognitive analogues. By assigning a quantifiable taxonomy to these principles, we will be able to rank and weight them with regard to their real-world applicability, developing a set of "meta-guidelines" and rules for applying them to a given interface in an automated manner. Combined with cognitive models and multi-modal HCI analysis, we seek to develop, in parallel with these guidelines, the interface evaluation system responsible for their application. E J Kalafarski 15:21, 6 February 2009 (UTC)
- I. Introduction, applications of guidelines
- A. Application to automated usability evaluations (AUE)
- II. Popular and seminal examples
- A. Shneiderman
- B. Google
- C. Maeda
- D. Existing international standards
- III. Elements of guideline sets, relationship to design patterns
- IV. Goals for potentially developing a guideline set within the scope of this proposal
User interface evaluations
History/interaction capture
Cost-based analyses
- I'm pretty sure that this paper referred to some earlier paper with a very similar title. OWNER: Eric Sodomka
In A Framework of Interaction Costs in Information Visualization, Lam identifies seven types of costs that can be used for interface evaluation:
- Decision costs to form goals
- System-power costs to form system operations
- Multiple input mode costs to form physical sequences
- Physical-motion costs to execute sequences
- Visual-cluttering costs to perceive state
- View-change costs to interpret perception
- State-change costs to evaluate interpretation
These are based on Donald Norman's Seven Stages of Action from his book, The Design of Everyday Things (summary).
Multimodal HCI
Continued advancements in several signal processing techniques have given rise to a multitude of mechanisms that allow for rich, multimodal, human-computer interaction. These include systems for head-tracking, eye- or pupil-tracking, fingertip tracking, recognition of speech, and detection of electrical impulses in the brain, among others Sharma-1998-TMH. With ever-increasing computing power, integrating these systems in real-time applications has become a plausible endeavor.
Head-tracking
- In virtual, stereoscopic environments, head-tracking has been exploited with great success to create an immersive effect, allowing a user to move freely and naturally while dynamically updating the user’s viewpoint in a visual environment. Head-tracking has been employed in non-immersive settings as well, though careful consideration must be paid to account for unintended movements by the user, which may result in distracting visual effects.
Pupil-tracking
- Pupil-tracking has been studied a great deal in the field of Cognitive Science ... (need some examples here from CogSci). In the HCI community, pupil-tracking has traditionally been used for posterior analysis of interface designs, and is particularly prevalent in web interface design. An alternative utility of pupil-tracking is to employ it in real-time as an actual mode of interaction. This has been examined in relatively few cases (citations), where typically the eyes are used to control a cursor onscreen. Like head-tracking, implementations of pupil-tracking must be conscious of unintended eye-movements, which are incredibly frequent.
Fingertip-tracking, Gestural recognition
- Fingertip tracking and gestural recognition of the hands are the subjects of much research in the HCI community, particularly in the Virtual Environment and Augmented Reality disciplines. Less implicit than head or pupil-tracking, gestural recognition of the hands may draw upon the wealth of precedents readily observed in natural human interactions. As sensing technologies become less obtrusive and more robust, this method of interaction has the potential to become quite effective.
Speech Recognition
- Speech recognition is becoming much better, though effective implementation of its desired effects is non-trivial in many applications. (More on this later).
Brain Activity Detection
- The use of electroencephelograms (EEGs) in HCI is quite recent, and with limited degrees of freedom, few robust interfaces have been designed around it. Some recent advances in the pragmatic use of EEGs in HCI research can be seen in Grimes et al. The possibility of using brain function to interface with a machine is cause for great excitement in the HCI community, and further advances in non-invasive techniques for accessing brain function may allow teleo-HCI to become a reality.
In sum, the synchronized usage of these modes of interaction make it possible to architect an HCI system capable of sensing and interpreting many of the mechanisms humans use to transmit information among one another. (Trevor)
Significance
Preliminary results
Research plan
We can speculate here about a longer-term research plan, but it may not be necessary to actually flesh out this part of the "proposal"