What about "Similarity"?

Definition

In premDAT, the similarity is a measure of how close two characters are. It ranges from 1 to 100 and currently relies on two attributes of the data:

  1. The dimensions
  2. The tags

Each pair of characters is compared through each dimension. Similar score (or level) in a dimension brings more points, but the importance of that dimension varies. When a dimension reach 1 or 5 for at least one character, it is considered important - a defining attribute of that character. If both scores match for a defining attribute, the similarity is level will likely be high. Conversely, high differences on defining attributes will severely decrease similarity. Matching tags also bring a few more points to the final computation, while tags present for a character and not for the other slightly diminishes the final similarity score.


Concrete example (and values)

Consider the following abstract example with two characters (A and B), three tags (I, II, III) and five dimensions.

A has the following dimensions:

  • Dimension 1 : 4
  • Dimension 2 : 2
  • Dimension 3 : 3
  • Dimension 4 : 5
  • Dimension 5 : 1
  • Tags: I

B has the following dimensions:

  • Dimension 1 : 1
  • Dimension 2 : 2
  • Dimension 3 : 5
  • Dimension 4 : 4
  • Dimension 5 : 3
  • Tags: I, III

Here is how the similarity is computed:

  • Dimension 1: the importance of the dimension is 10, because B has a score of 1. The similarity score for the dimension is 4 - Math.abs(Dimension 1 for A - Dimension 1 for B), which results in 1. Thus, the dimension's score is 10 for similarity out of a total of 40 (10/40).
  • Dimension 2: importance of 5, similarity of 4. Score is 20/20.
  • Dimension 3: importance of 10, similarity of 2. Score is 20/40.
  • Dimension 4: importance of 10, similarity of 3. Score is 30/40.
  • Dimension 5: importance of 10, similarity of 2. Score is 20/40.
  • Tags: each tag adds 3 points to the total. Tags that match between two characters add 3 points to the similarity score. A and B both have tag I, but only B has tag III. Thus, the total is increased of 6 points while the similarity score is increased of 3 (3/6).

The final result is computed by similarity / total * 100. Thus, the similarity between A and B is 93 / 146 * 100 ~= 69% .