An observation is a vector of values, not necessarily of the same type, associated with the object which is to be clustered. They might be of the following types:
Note that datasets may include different data types, complicating the distance calculation.
- Numerical: e.g. 4'8", 6'4", 5'10" - if we placed values on a scale we could visualize distance.
- Ordinal: e.g. 1st, 2nd, 3rd - the ordering matters (e.g. 1st is closer to 2nd than to 3rd, but we don't know anything about how much closer).
- Binary: True / False - the feature is either there or it is not.
- Categorical e.g red, blue, green - there is no ordering to the categories.
Let's dig into the concept of "distance" a little more ....