I have been listening to music through Spotify, the ad-supported streaming music service. On the application home page this recommends “Artists you may like”. This morning I turned on and it had a fairly eclectic mix – Motörhead, Marilyn Manson, Faster Pussycat, Pantera,… Wham! and George Michael. I presume this ‘personalised’ list is based on what I have been listening to – so what have I been listening to?! I think I listened to a Rage Against the Machine album a few weeks ago but I’ve certainly not been listening to any 80s disco pop. Usually I have noticed this list filled with 50s rock. I think that one of the first artists I searched for on Spotify was Buddy Holly and, although I don’t listen to a lot from that era I presumed the “Artists you may like” algorithm was relying too much on initial conditions. When I think about yesterday I seem to remember listening to a lot of Blur. So we have a glimpse at the algorithm:
1950s rock ‘n’ roll + Blur = Marilyn Manson and Wham!
Not convinced, hmm? It does make me wonder how these algorithms work. I know Amazon does a similar process when you buy something: “People who bought the items in your basket also bought”. This, it seems to me, often offers items very similar to the one I have bought. For example, “Customers who bought this monitor also bought these other three monitors”, i.e. “Customers who bought this device also bought this other device which performs exactly the same function”. Amazon also has a “customers who looked at this item also looked at” which is very useful and often points to competing devices, but once you have chosen one it seems to me Amazon should be trying to tempt you with different products not ones that solve the same problem as the one you have just bought. I suppose some people may purchase lots of similar equipment for, say, a company but I would say it is unlikely that the average user, having bought one device, would want to buy another that performs the same function (particularly before the first one has arrived).
In this forum post, user Oscar Rylin suggests the “Artists you may like” is based on what people who listen to what you listen to also listen to. So it may be that the raw data – user behaviour – is erratic and the predictions are in turn. Still, there are 6 slots for artists I may like and four are metal and two are disco. It seems if it couldn’t make its mind up reliably all the slots should be likely to be distinct from each other as well as from my listening history. Also on that forum post is some suggestion that Spotify is just advertising arbitrarily. However it is in Spotify’s interest to get you discovering music you like so that you keep listening for longer and hear more adverts. So making poor or irrelevant suggestions is bad for business.
This business of better algorithm design is a fairly hot topic these days with Web 2.0 user generated content. I remember that in the podcast episode 28 I pointed to an article in the Guardian, “Go figure … why mathematicians rule the internet,” on such algorithms. I know Chris Budd will tell you this sort of problem is the boom area of applied mathematics in the 21st Century, and did so in podcast episode 26.