Quantifying and Comparing Features in High-Dimensional Datasets
Harald Piringer, Wolfgang Berger, Helwig Hauser
INPROCEEDINGS,
Proceedings of the International Conference on Information Visualisation (IV 2008),
7, 2008
AbstractLinking and brushing is a proven approach to analyzing multi-dimensional
datasets in the context of multiple coordinated views. Nevertheless, most of the respective
visualization techniques only offer qualitative visual results. Many user tasks, however,
also require precise quantitative results as, for example, offered by statistical analysis.
In succession of the useful Rank-by-Feature Framework, this paper describes a joint visual
and statistical approach for guiding the user through a high-dimensional dataset by ranking
dimensions (1D case) and pairs of dimensions (2D case) according to statistical summaries.
While the original Rank-by-Feature Framework is limited to global features, the most
important novelty here is the concept to consider local features, i.e., data subsets defined
by brushing in linked views. The ability to compare subsets to other subsets and subsets to
the whole dataset in the context of a large number of dimensions significantly extends the
benefits of the approach especially in later stages of an exploratory data analysis.
A case study illustrates the workflow by analyzing counts of keywords for classifying
e-mails as spam or no-spam.
Published
Proceedings of the International Conference on Information Visualisation (IV 2008)
Media
BibTeX
@inproceedings{piringer08comparing,
author = {Harald Piringer and Wolfgang Berger and Helwig Hauser},
title = {Quantifying and Comparing Features in High-Dimensional Datasets},
booktitle = {Proceedings of the International Conference on Information Visualisation (IV 2008)},
abstract = {Linking and brushing is a proven approach to analyzing multi-dimensional
datasets in the context of multiple coordinated views. Nevertheless, most of the respective
visualization techniques only offer qualitative visual results. Many user tasks, however,
also require precise quantitative results as, for example, offered by statistical analysis.
In succession of the useful Rank-by-Feature Framework, this paper describes a joint visual
and statistical approach for guiding the user through a high-dimensional dataset by ranking
dimensions (1D case) and pairs of dimensions (2D case) according to statistical summaries.
While the original Rank-by-Feature Framework is limited to global features, the most
important novelty here is the concept to consider local features, i.e., data subsets defined
by brushing in linked views. The ability to compare subsets to other subsets and subsets to
the whole dataset in the context of a large number of dimensions significantly extends the
benefits of the approach especially in later stages of an exploratory data analysis.
A case study illustrates the workflow by analyzing counts of keywords for classifying
e-mails as spam or no-spam.},
location = {London, UK},
year = {2008},
pages = {240--245},
month = 7,
URL = {http://dx.doi.org/10.1109/IV.2008.17},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}
|