CS295J/Research proposal: Difference between revisions

Revision as of 13:29, 5 February 2009

Project summary

We propose to integrate theories models of cognition, models of perception, rules of design, and concepts from the discipline of human-computer interaction to develop a predictive model of user performance in interacting with computer software for visual and analytical work. Our proposed model comprises a set of computational elements representing components of human cognition, memory, or perception. The collective abilities and limitations of these elements can be used to provide feedback on the likely efficacy of user interaction techniques.

The choice of human computational elements will be guided by several models or theories of cognition and perception, including Gestalt, Distributed, Gibson, ???(where pathway, when pathway)???, ???working-memory???, ..., and ???. The list of elements will be extensible. The framework coupling them will allow for experimental predictions of utility of user interfaces that can be verified against human performance.

Coupling the system with users will involve a data capture mechanism for collecting the communications between a user interface and a user. These will be primarily event based, and will include a new low-cost camera-based eye-tracking system

During early development, existing interfaces will be evaluated manually to characterize their

(we need some way to specify interaction techniques...)

Specific Contributions

A model of human cognitive and perceptual abilities when using computers
1. Demonstration of the model in predicting human performance with some interfaces.
2. ???
Something about design rules collected and merged
1. Something comparing these collected rules to a baseline (establishing their value)
2. ???
A low-overhead mechanism for capturing event-based interactions between a user and a computer, including web-cam based eye tracking. (should we buy or find out about borrowing use of pupil tracker?) Should we include other methods of interaction here? Audio recognition seems to be the lowest cost. It would seem that a system that took into account head-tracking, audio, and fingertip or some other high-DOF input would provide a very strong foundation for a multi-modal HCI system. It may be more interesting to create a software toolkit that allows for synchronized usage of those inputs than a low-cost hardware setup for pupil-tracking. I agree pupil-tracking is useful, but developing something in-house may not be the strongest contribution we can make with our time. (Trevor)
1. Accuracy study of eye tracking (2 cameras? double as an input device?)
2. ???
A set of critiques of existing software used for visual and analytical work based on design rules
1. ???

Specific Aims

build X
build Y
run experiment Z
compare X with existing approach Q

Background

Models of cognition

There are several models of cognition, ranging from fundamental aspects of neurological processing to extremely high-level psychological analysis. Three main theories seem to have become recognized as the most helpful in conceptualizing the actual process of HCI. These models all agree that one cannot accurately analyze HCI by viewing the user without context, but the extent and nature of this context varies greatly.

Activity Theory, developed in the early 20th century by Russian psychologists S.L. Rubinstein and A.N. Leontiev, posits the existence of four discrete aspects of human-computer interaction. The "Subject" is the human interacting with the item, who possesses an "Object" (e.g. a goal) which they hope to accomplish by using a tool. The Subject conceptualizes the realization of the Object via an "Action", which may be as simple or complex as is necessary. The Action is made up of one or more "Operations", the most fundamental level of interaction including typing, clicking, etc.

A key concept in Activity Theory is that of the artifact, which mediates all interaction. The computer itself need not be the only artifact in HCI - others include all sorts of signs, algorithmic methods, instruments, etc.

A longer synopsis of Activity Theory may be found at this website.

The Situated Action Model focuses on emergent behavior, emphasizing the subjective aspect of human-computer interaction and the therefore-necessary allowance for a wide variety of users. This model proposes the least amount of contextual interaction, and seems to maintain that the interactive experience is determined entirely by the user's ability to use the system in question. While limiting, this concept of usability can be very informative when designing for less tech-savvy users.

Distributed Cognition proposes that the computer (or, as in Activity Theory, any other artifact) can be used and ought to be thought of as an extension of the mental processing of the human. This is not to say that the two are of equal or even comparable cognitive abilities, but that each has unique strengths and that recognition of and planning around these relative advantages can lead to increased efficiency and effectiveness. The rotation of blocks in Tetris serves as a perfect example of this sort of cognitive symbiosis.

Workflow Context (Andrew Bragdon - OWNER)

There are, at least, two levels at which users work (Gonzales, et al., 2004). Users accomplish individual low-level tasks which are part of larger working spheres; for example, an office worker might send several emails, create several Post-It (TM) note reminders, and then edit a word document, each of these smaller tasks being part of a single larger working sphere of "adding a new section to the website." Thus, it is important to understand this larger workflow context - which often involves extensive levels of multi-tasking, as well as switching between a variety of computing devices and traditional tools, such as notebooks. In this study it was found that the information workers surveyed typically switch individual tasks every 2 minutes and have many simultaneous working spheres which they switch between, on average every 12 minutes. This frenzied pace of switching tasks and switching working spheres suggests that users will not be using a single application or device for a long period of time, and that affordances to support this are important.

Qunatitative Models: Fitts's law, Steering Law

Fitts's law and the steering law are examples of quantiative models that predict user performance when using certain types of user interfaces. In addition to these classic models, Xiang Cao and Shumin Zhai developed and validated a quantitative model of human performance of pen stroke gestures in 2007.

Gibsonianism

(relevant stuff about Gibson's theory) We will build on top of this theory/model by ... .

(Note: Gibsonianism is really a theory of perception and should be moved to that section if someone knows how.)

Gibsonianism, named after James J. Gibson and more commonly referred to as ecological psychology, is an epistemological direct realist theory of perception and action. In contrast to information processing and cognitivist approaches which generally assume that perception is a constructive process operating on impoverished sense-data inputs (e.g. photoreceptor activity) to generate representations of the the world with added structure and meaning (e.g. a mental or neural "picture" of a chair), ecological psychology treats perception as a relation that places animals in direct and lawful epistemic contact with behaviorally-relevant events and features of their environmental niches (Warren, 2005). These lawfully-specified and behaviorally-relevant features of the environment constitute the possibilities for action that the environment "affords" the animal (Gibson, 1986).

Gibson's notion of affordance has many implications for our enterprise, however, it is worth noting that the original definition of affordance emphasizes possibilities for action and not their relative likelihoods. For example, for most humans, laptop computer screens afford puncturing with Swiss Army knives, however, it is unlikely that a user will attempt to retrieve an electronic coupon by carving it out of their monitor. This example illustrates that interfaces often afford a class of actions that are undesirable from the perspective of both the designer and the user.

(Jon)

Distributed cognition

Distributed cognition is a theory in which thoughts take place in and outside of the brain. Human's have a great ability to use tools and to incorporate their environments into their sphere of thinking and information processing. Clark puts it nicely in [Clark-1994-TEM].

Therefore, optimal configurations when considering HCI design will treat the brain, person, interface, and computer as a holistic system comprising cognitions.

In practical terms, the issue at hand for our proposal is how to best maximize utility by distributing the cognitive tasks at hand to different components of the whole system. Simply, which tasks can we off-load to the computer to do for us, faster and more accurately. OFF-LOADING (REF).

Typically, those tasks most eligible to be off-loaded are the ones which we perform poorly on. Conveniently, the tasks which computers perform poorly on we excel at. Here are a few examples:

Computer's Area of Expertise: number crunching, memory, logical reasoning, precise
Human's Area of Expertise: Associative thought, real-world knowledge, social behavior, alogical reasoning, imprecise

Using this division of cognitive labor allows us to optimize task work flows. Ignoring it puts strain and bottlenecks at either the computer, the human, or the interface. The field of HCI is full of countless examples of failures which can be attributed to not recognizing which tasks should be handled by which sub-system.

As a heuristic to divide thinking, one might turn to the dual-process theory literature (REF TO EVANS). What is most often called System 1 is what human's are good at, while System 2 tasks are what computers do well.

(Gideon)

Information Processing Approach to Cognition

The dominant approach in Cognitive Science is called information processing. It sees cognition as a system that takes in information from the environment, forms mental representations and manipulates those representations in order to create the information needed to achieve its goals. This approach includes three levels of analysis originally proposed by Marr (will cite):

Computational - What are the goals of a process or representation? What are the inputs and desired outputs required of a system which performs a task? Models at this level of analysis are often considered normative models, because any agent wanting to perform the task should conform to them. Rational agent models of decision making, for example, belong at this level of analysis.
Process/Algorithmic - What are the processes or algorithms involved in how humans perform the task? This is the most common level of analysis as it focuses on the mental representations, manipulations and computational faculties involved in actual human processing. Algorithmic descriptions of human capabilities and limitations, such as working memory size, belong at this level of analysis.
Implementation - How are the processes and algorithms be actual biological computation? Dopamine theories of reward learning, for example, belong at this level of analysis.

The information processing approach is often contrasted with the distributed cognition approach. Its advantage is that it finds general mechanisms that are valid across many different contexts and situations. Its disadvantage is that it can have difficulty explaining the rich interactions between people and their environment. (Adam)

???

Models of perception

Design guidelines

I'm a little behind today, but I will have this section drafted by tonight. In the meantime, here is my outline. OWNER: E J Kalafarski 17:20, 3 February 2009 (UTC)

I. Introduction, applications of guidelines

A. Application to automated usability evaluations (AUE)

II. Popular and seminal examples

A. Shneiderman

B. Google

C. Maeda

D. Existing international standards

III. Elements of guideline sets, relationship to design patterns

IV. Goals for potentially developing a guideline set within the scope of this proposal

User interface evaluations

History/interaction capture

Cost-based analyses

Heidi Lam, "A Framework of Interaction Costs in Information Visualization," IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 6, pp. 1149-1156, Nov./Dec. 2008, doi:10.1109/TVCG.2008.109

I'm pretty sure that this paper referred to some earlier paper with a very similar title.

Multimodal HCI

Continued advancements in several signal processing techniques have given rise to a multitude of mechanisms that allow for rich, multimodal, human-computer interaction. These include systems for head-tracking, eye- or pupil-tracking, fingertip tracking, recognition of speech, and detection of electrical impulses in the brain, among others Sharma-1998-TMH. With ever-increasing computing power, integrating these systems in real-time applications has become a plausible endeavor.

Head-tracking

In virtual, stereoscopic environments, head-tracking has been exploited with great success to create an immersive effect, allowing a user to move freely and naturally while dynamically updating the user’s viewpoint in a visual environment. Head-tracking has been employed in non-immersive settings as well, though careful consideration must be paid to account for unintended movements by the user, which may result in distracting visual effects.

Pupil-tracking

Pupil-tracking has been studied a great deal in the field of Cognitive Science ... (need some examples here from CogSci). In the HCI community, pupil-tracking has traditionally been used for posterior analysis of interface designs, and is particularly prevalent in web interface design. An alternative utility of pupil-tracking is to employ it in real-time as an actual mode of interaction. This has been examined in relatively few cases (citations), where typically the eyes are used to control a cursor onscreen. Like head-tracking, implementations of pupil-tracking must be conscious of unintended eye-movements, which are incredibly frequent.

Fingertip-tracking, Gestural recognition

Fingertip tracking and gestural recognition of the hands are the subjects of much research in the HCI community, particularly in the Virtual Environment and Augmented Reality disciplines. Less implicit than head or pupil-tracking, gestural recognition of the hands may draw upon the wealth of precedents readily observed in natural human interactions. As sensing technologies become less obtrusive and more robust, this method of interaction has the potential to become quite effective.

Speech Recognition

Speech recognition is becoming much better, though effective implementation of its desired effects is non-trivial in many applications. (More on this later).

Brain Activity Detection

The use of EEG in HCI is quite recent, and with limited degrees of freedom, few robust interfaces have been designed around it. Nevertheless, the possibility of using any brain function at all to interface with a machine is cause for excitement, and further advances in non-invasive techniques for accessing brain function may allow telepathic HCI to become a reality.

In sum, the synchronized usage of these modes of interaction make it possible to architect an HCI system capable of sensing and interpreting many of the mechanisms humans use to transmit information among one another. (Trevor)

Significance

Preliminary results

Research plan

We can speculate here about a longer-term research plan, but it may not be necessary to actually flesh out this part of the "proposal"

@@ Line 50: / Line 50: @@
 There are, at least, two levels at which users work ([http://portal.acm.org/citation.cfm?id=985692.985707&coll=portal&dl=ACM&CFID=20781736&CFTOKEN=83176621 Gonzales, et al., 2004]).  Users accomplish individual low-level tasks which are part of larger ''working spheres''; for example, an office worker might send several emails, create several Post-It (TM) note reminders, and then edit a word document, each of these smaller tasks being part of a single larger working sphere of "adding a new section to the website."  Thus, it is important to understand this larger workflow context - which often involves extensive levels of multi-tasking, as well as switching between a variety of computing devices and traditional tools, such as notebooks.  In this study it was found that the information workers surveyed typically switch individual tasks every 2 minutes and have many simultaneous working spheres which they switch between, on average every 12 minutes.  This frenzied pace of switching tasks and switching working spheres suggests that users will not be using a single application or device for a long period of time, and that affordances to support this are important.
-====Qunatitative Models: Fitts's law and the steering law====
+====Qunatitative Models: Fitts's law, Steering Law====
 Fitts's law and the steering law are examples of quantiative models that predict user performance when using certain types of user interfaces.  In addition to these classic models, Xiang Cao and Shumin Zhai developed and validated a quantitative model of human performance of pen stroke gestures in 2007.

CS295J/Research proposal: Difference between revisions

Revision as of 13:29, 5 February 2009

Contents

Project summary

Specific Contributions

Specific Aims

Background

Models of cognition

Qunatitative Models: Fitts's law, Steering Law

Gibsonianism

Distributed cognition

Information Processing Approach to Cognition

???

Models of perception

Design guidelines

User interface evaluations

History/interaction capture

Cost-based analyses

Multimodal HCI

Head-tracking

Pupil-tracking

Fingertip-tracking, Gestural recognition

Speech Recognition

Brain Activity Detection

Significance

Preliminary results

Research plan

Navigation menu

CS295J/Research proposal: Difference between revisions

Revision as of 13:29, 5 February 2009

Project summary

Specific Contributions

Specific Aims

Background

Models of cognition

Qunatitative Models: Fitts's law, Steering Law

Gibsonianism

Distributed cognition

Information Processing Approach to Cognition

???

Models of perception

Design guidelines

User interface evaluations

History/interaction capture

Cost-based analyses

Multimodal HCI

Head-tracking

Pupil-tracking

Fingertip-tracking, Gestural recognition

Speech Recognition

Brain Activity Detection

Significance

Preliminary results

Research plan

Navigation menu

Search