Report an error

14.2 Cosine Similarity

In the previous section, we saw how we could use rider reviews to deliver general insights about attractions at Lobster Land.

Next, let’s dive a bit deeper, in order to see how we could use available data to guide the process for making recommendations tailored for specific consumers. In order to do so, we will use a metric known as cosine similarity.

In the formula above, the numerator is the dot product of A and B. The dot product is found by multiplying the pairwise values between vectors A and B, and then summing the resulting products.

Next, we must find the norm, or length, of each of the vectors. For each vector, the norm is found by taking the square root of the sum of each of its elements’ squared values. The product of the two vectors’ norm lengths will be the denominator in the cosine similarity calculation.

To see a demonstration of cosine similarity in action, let’s imagine that the following group of 10 friends each watched several programs on Neflix, and rated them from 1-10, with 10 being the best. In the table below, an ‘x’ indicates that the person did not watch that program.

To demonstrate cosine similarity in action, let’s take a look at Harriet and Julia. How similar are they? Since this method depends on pairwise relationships, we can only programs that they have each rated. That means that Glow Up, Virgin River, and The Next 365 Days will be “in play” for this analysis. Harriet’s scores for those three are 6, 7, and 5, respectively, while Julia’s scores are 10, 8, and 7.

For our numerator, we will start with the dot products, which we obtain by multiplying the pairwise values for the three programs: 6 * 10 = 60, 7*8 = 56, and 5 * 7 = 35. Next, we sum these numbers to obtain 60 + 56 + 35 = 151.

For Harriet, the norm is approximately 10.49 or :

For Julia, the norm is approximately 14.59 or:

The product of these two norms, 153.05, forms our denominator, and the cosine similarity between Harriet and Julia is 151/153.05, or 0.9866.

Without any basis for comparison, though, that number is a bit hard to meaningfully interpret. Let’s find another cosine similarity now – this time, we’ll use Harriet and Eddie. We are once again limited to programs that they have each viewed. For this comparison, that means we can use Glow Up, Virgin River, Alchemy of Souls, and The Next 365 Days. We will use the same process as before, but now with four elements from each reviewer, rather than three.

We will obtain the dot product component products with 7 * 6 = 42, 2 * 7 = 14, 8 * 4 = 32, and 6 * 5 = 30. Summed up, that gives us 42 + 14 + 32 + 30 = 118 in the numerator. Next, we will obtain our vector norms.

For Eddie, that is approximately 12.37 or

For Harriet, that is approximately 11.22 or

The product of these two norms, 138.79, forms our denominator, and the cosine similarity between Eddie and Harriet is 118/138.79, or 0.85.

Of course, we would ideally have more data to analyze before making a recommendation, but based on what we have, we can say that Harriet and Julia are more similar to one another than are Eddie and Harriet. If we plan to make movie or show recommendations for Harriet, based either on the reviews from Eddie or from Julia, then we should select Julia’s ratings to inform those recommendations.

In Chapter 3, we saw how Euclidean distance can be used to measure the difference between two observations.

In both cases, we require pairwise relationships in the data in order to generate the metric. All else equal, Euclidean distances can be expected to grow larger as the feature space grows. Cosine similarity, however, always lands between 0 and 1, regardless of the size of the feature space. For this reason, cosine similarity lets us make comparisons between pairs, even when those pairs are not equivalently-sized.