Aptima and VAST Challenge 2011

From VrlWiki
Jump to navigation Jump to search

Slides from Aptima about this project and Brown collaboration: Media:AARDVARK_use_case.pdf

VAST Challenge 2011 Info

The task descriptions are available here: http://hcil.cs.umd.edu/localphp/hcil/vast11/index.php/taskdesc/index

The dataset for the challenge is available here: http://hcil.cs.umd.edu/localphp/hcil/vast11/index.php/dataset/register

Research Questions

From David's initial email:

  • Is provenance better displayed in the primary/overview case or available as drill-down data?
  • What data works better in layers and what data works better in separate windows and why?

More questions:

Related Work

Related Products

Palantir – http://www.palantir.com/ 2010 and 2008 entries http://www.cs.umd.edu/hcil/VASTchallenge2010/Entries/196_Palantir_GC/index_grand.htm http://vac.nist.gov/2008/entries/Palantir-Palantir-Grand-1/index.html

Oculus – http://www.oculusinfo.com/ 2010 entry http://www.oculusinfo.com/nspace2-and-the-ieee-vast-challenge/

Solutions


125-Bertini

http://129.63.17.205/vast/challengesubmissions/125-Bertini-GC/index_grand.htm

MC1: Tweets

Lots of data wrangling/pre-processing at initial stage of analysis. Built in-house analysis tool to examine tweets. Application features frequency chart, map, and filtering frames. Additional features are tweet browser and word cloud visualizer.

  • Started by looking at map overview for tweet patterns
  • Refinement patterns focus on identifying "interesting" tweet clusters in overview, using word cloud for detail views and user IDs, and then tracking user IDs for finding more pivotal events. Also looks at full tweets on occasion.

MC2: Network Analysis

Network analysis tool. Overview features data over time, with each time block annotated with important network events (e.g., DDoS attacks). Each time block assumes the same purposes of sparklines.

  • Analysis starts with overview mode, filtering for interesting, patterned events.
  • Constant filtering via time, types of events, and connection source/external IPs associated with events.

MC3: Text Analysis

Unlike other MC's, this is exceedingly messy. Uses STW Counter, Knime, Stanford Tagger, Jigsaw, Visio, KIAWordCloudVis, and lots of by-hand processing. Aim is to reduce corpus of documents to a list of candidates, and then into a shorter list of evidence. Analysis relied heavily on a manually generated list of suspicious words that could be used as initial-stage filters on the corpus.