In the last several years, large multi-dimensional databases have become
common in a variety of applications such as data warehousing and scientific
computing. Analysis and exploration tasks place significant demands on the
interfaces to these databases. Because of the size of the data sets, dense
graphical representations are more effective for exploration than spreadsheets
and charts. Furthermore, because of the exploratory nature of the analysis, it
must be possible for the analysts to change visualizations rapidly as they
pursue a cycle involving first hypothesis and then experimentation.
The Polaris user interface (click to zoom).
|
Over the last several years, we have been developing Polaris, an
interface for exploring large multi-dimensional databases that extends the
well-known Pivot Table interface first popularized by Microsoft Excel. The
novel features of Polaris include an interface for constructing visual
specifications of table-based graphical displays and the ability to generate a
precise set of relational queries from the visual specifications. The visual
specification can be rapidly and incrementally developed, giving the analyst
visual feedback as they construct complex queries and visualizations.
Formalism
The Polaris interface is simple and expressive because it is built on top of a
formalism for describing table-based graphical representations of relational
databases. Specifications in this language are compiled by our interpreter
into a set of efficient queries and drawing operations to generate
displays. The formalism precisely defines:
Visualizing an event log from the execution of a parallel
graphics application using Polaris.
|
- The mapping of data sources to layers. Multiple data sources may be
combined in a single Polaris visualization. Each data source maps to a
separate layer or set of layers.
- The number of rows, columns, and layers in the table and their relative
orders (left to right as well as back to front). The database dimensions
assigned to rows are specified by the fields on the x shelf, columns by fields
on the y shelf, and layers by fields on the layer (z) shelf. Multiple fields
may be dragged onto each shelf to show categorical relationships.
- The selection of records from the database and the partitioning of
records into different layers and panes.
- The grouping of data within a pane and the computation of statistical
properties and aggregates. Records may also be sorted into a given drawing
order.
- The type of graphic displayed in each pane of the table. Each graphic
consists of a set of marks, one mark per record in that pane.
- The mapping of data fields to retinal properties of the marks in the
graphics. The mappings used for any given visualization are shown in a set of
automatically generated legends.
A key component of the formalism is the table algebra we have defined. Using
the table algebra, an analyst or programmer can specify the configuration of a
sophisticated table by simply providing three algebraic expressions: one for
the x-axis of the table, one for the y-axis, and one that defines layering (or
the z-axis). In the Polaris interface the user constructs these table
expressions by dragging and dropping fields on shelves; programmers can
directly write the expressions as part of an XML specification.
Hierarchies and Data Cubes
Analysis of profit/sales data for a hypothetical
coffee chain.
|
To support interactive analysis, many data warehouses are being augmented with
hierarchical structures that provide meaningful levels of abstraction that can
be leveraged by both the computer and analyst. These hierarchies can encode
known semantic information about the underlying data warehouses or can be
generated from algorithmic analysis such as classification or clustering. This
hierarchical structure generates many challenges and opportunities in the
design of systems for the query, analysis, and visualization of these
databases. Our paper presented at KDD in July 2002 explains in detail how we
extended the interface, formalism, and generation of data queries within
Polaris to support hierarchically structured data warehouses.
Panning and Zooming
A compelling visualization architecture is pan-and-zoom. Most analysts start
with an overview of the data before gradually refining their view to be more
focused and detailed. Multiscale pan-and-zoom systems are effective because
they directly support this approach. However, generating abstract overviews
of large data sets is difficult, and most systems take advantage of only one
type of abstraction: visual abstraction. Furthermore, these existing systems
limit the analyst to a single zooming path on their data and thus a single set
of abstract views.
Screenshots from three different multiscale
visualization systems developed using the Polaris formalism.
|
In our paper to be presented at Infovis in October 2002, we present a (1) a
formalism for describing multiscale visualizations of data cubes with both
data and visual abstraction, and (2) a method for independently zooming along
one or more dimensions by traversing a zoom graph with nodes at different
levels of detail. As an example of how to design multiscale visualizations
using our system, we describe four design patterns using our formalism. These
design patterns show the effectiveness of multiscale visualization of general
relational databases.
The Polaris formalism is an important components of these multiscale
systems--it is the mechanism we use to describe the individual nodes within
the zoom graphics. This work is a nice example of how the Polaris formalism
can be effectively used separately from the Polaris interface as the
foundation of a visualization system.
Implementation and Release
The current implementation of Polaris is built within the Rivet visualization
environment. Rivet is an environment for rapidly constructing visualization
from components. The majority of Polaris is written in C++ and OpenGL with
some pieces written in Tcl. Data can either be directly loaded into Rivet
using built in parsers (regular expressions, delimited files) or stored
externally in a database or datacube and accessed through OLE DB. Rivet (and
thus Polaris) runs on Windows, Linux, and Solaris.
We are currently working on a new version of Polaris and Rivet that we intend
to release to the public. This will likely be a native Windows application and
will be available (hopefully) sometime this fall.
People
Publications
Multiscale Visualization Using Data Cubes
Chris
Stolte, Diane Tang and Pat
Hanrahan
BEST PAPER AWARD
Proceedings of the Eighth IEEE Symposium on Information
Visualization, October 2002.
(paper) (slides)
Query, Analysis,
and Visualization of Hierarchically Structured Data using Polaris
Chris
Stolte, Diane Tang and Pat
Hanrahan
Proceedings of the Eighth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, July 2002.
(paper) (slides)
Polaris: A System for Query, Analysis and Visualization of
Multi-dimensional Relational Databases (extended paper)
Chris
Stolte, Diane Tang and Pat
Hanrahan IEEE Transactions on Visualization and
Computer Graphics, Vol. 8, No. 1, January 2002.
(paper)
Polaris: A System for
Query, Analysis and Visualization of Multi-dimensional Relational
Databases Chris
Stolte and Pat
Hanrahan Proceedings of the Sixth IEEE Symposium on Information
Visualization, October 2000.
(paper) (slides)
Presentations
"Multiscale Visualization Using Data
Cubes" given by Chris Stolte at the Eighth IEEE Symposium on
Information Visualization, October 2002.
"Polaris: Query, Analysis, and Visualization of Large
Hierarchical Relational Databases" given by Chris Stolte at the DIMACS
Workshop on Data Mining and Visualization, October 2002.
"Polaris: Query, Analysis, and Visualization of Large
Hierarchical Relational Databases" given by Pat Hanrahan at IBM,
September 2002.
"Query, Analysis, and Visualization of Hierarchically
Structured Data using Polaris." given by Chris Stolte at
KDD 2002.
"Polaris: A System for Query, Analysis, and
Visualization of Relational Databases." guest lecture given by Chris
Stolte in CS345: "Database Systems: Foundations and Frontiers" at Stanford in May 2002.
"Polaris: A System for Query, Analysis, and Visualization
of Multidimensional Relational Databases." given by Chris Stolte at
Infovis 1999.
|