Aptima and VAST Challenge 2011
Slides from Aptima about this project and Brown collaboration: Media:AARDVARK_use_case.pdf
VAST Challenge 2011 Info
The task descriptions are available here: http://hcil.cs.umd.edu/localphp/hcil/vast11/index.php/taskdesc/index
The dataset for the challenge is available here: http://hcil.cs.umd.edu/localphp/hcil/vast11/index.php/dataset/register
Research Questions
From David's initial email:
- Is provenance better displayed in the primary/overview case or available as drill-down data?
- What data works better in layers and what data works better in separate windows and why?
More questions:
Related Work
Related Products
Palantir – http://www.palantir.com/ 2010 and 2008 entries http://www.cs.umd.edu/hcil/VASTchallenge2010/Entries/196_Palantir_GC/index_grand.htm http://vac.nist.gov/2008/entries/Palantir-Palantir-Grand-1/index.html
Oculus – http://www.oculusinfo.com/ 2010 entry http://www.oculusinfo.com/nspace2-and-the-ieee-vast-challenge/
Solutions
125-Bertini
http://129.63.17.205/vast/challengesubmissions/125-Bertini-GC/index_grand.htm
MC1: Tweets
Lots of data wrangling/pre-processing at initial stage of analysis. Built in-house analysis tool to examine tweets. Application features frequency chart, map, and filtering frames. Additional features are tweet browser and word cloud visualizer.
- Started by looking at map overview for tweet patterns
- Refinement patterns focus on identifying "interesting" tweet clusters in overview, using word cloud for detail views and user IDs, and then tracking user IDs for finding more pivotal events. Also looks at full tweets on occasion.
MC2: Network Analysis
Network analysis tool. Overview features data over time, with each time block annotated with important network events (e.g., DDoS attacks). Each time block assumes the same purposes of sparklines.
- Analysis starts with overview mode, filtering for interesting, patterned events.
- Constant filtering via time, types of events, and connection source/external IPs associated with events.
MC3: Text Analysis
Unlike other MC's, this is exceedingly messy. Uses STW Counter, Knime, Stanford Tagger, Jigsaw, Visio, KIAWordCloudVis, and lots of by-hand processing. Aim is to reduce corpus of documents to a list of candidates, and then into a shorter list of evidence. Analysis relied heavily on a manually generated list of suspicious words that could be used as initial-stage filters on the corpus.