Working increase is the first consequence of what


Working with
high-dimensional data means working with data that are embedded in
high-dimensional spaces. When speaking about non-temporal data, this means that
each sample contains many attributes or characteristics. There are some
properties of high-dimensional spaces that are counter intuitive compared to
similar properties in low-dimensional spaces. Consequences on data analysis are
discussed, with possible ideas to be incorporated in data analysis tools in
order to meet the specific requirements of high-dimensional spaces. Obviously,
the models built through learning are only valid in the range or volume of the
space where learning data are available. Whatever is the model or class of
models, generalization on data that are much different from all learning points
is impossible. In other words, relevant generalization is possible from
interpolation but not from extrapolation.

It is easy to
see that, every other constraint being kept unchanged, the number of learning
data should grow exponentially with the dimension (if 10 data seem reasonable
to learn a smooth 1-dimensional model), 100 are necessary to learn a
2-dimensional model with the same smoothness, 1000 for a 3-dimensional model,
etc.). This exponential increase is the first consequence of what is called the
curse of dimensionality 3. More generally, the curse of dimensionality is the
expression of all phenomena that appear with high-dimensional data, and that
have most often unfortunate consequences on the behavior and performances of
learning algorithms. Another direction to follow is to reduce the
dimensionality of the data space, through appropriate nonlinear data projection
methods. Subspace clustering is an extension of traditional clustering that
seeks to find clusters in different subspaces within a dataset. The paper B4
presents a survey of the various subspace clustering algorithms along with a
hierarchy organizing the algorithms by their defining characteristics. The
objects are usually represented as a vector of measurements, or a point in
multidimensional space. The similarity between objects is often determined
using distance measures over the various dimensions in the dataset. There are
many algorithms for clustering high dimensional data automatically find clusters
in subspaces of the full space. One example of such a clustering technique is
“projected” clustering AWYPP99, which also finds the set of dimensions
appropriate for each cluster during the clustering process. The paper B5,
gives detailed information about the another subspace clustering algorithm CLIQUE
which AGGR98 that attempts to deal with these problems and whose approach is
based on the following interesting observation: a region that is dense in a
particular subspace must create dense regions when projected onto lower
dimensional subspaces. There are other algorithms like MAFIA, DENCLUE which
finds the clusters by dense regions in subspace. The principal challenge in
extending cluster analysis to high dimensional data is to overcome the “curse
of dimensionality,” In particular, there is no reason to expect that one type
of clustering approach will be suitable for all types of data, even all high
dimensional data. The high dimensional data is only one issue that needs to be
considered when performing cluster analysis. PROCLUS 5 was the first top-down
subspace clustering algorithm. 

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now
x

Hi!
I'm William!

Would you like to get a custom essay? How about receiving a customized one?

Check it out