BBC R&D automatic tagging of speech audio using vector spaces

A post on the BBC Research & Development Blog outlines work on automatic tagging of speech audio. The work is concerned with the World Service archive, which apparently has “very sparse” associated programme data. The archive “covers many decades and consists of about two and a half years of high-quality continuous audio content”. The aim was to associate the content of the programme with key words. The post explains:

For example if a programme mentions ‘London’, ‘Olympics’ and ‘1948’ a lot, then there is a high chance it is talking about the 1948 Summer Olympics.

The post discusses the technical challenges of this endeavour – automatic transcription, searching for terms from a subject classification. This uses “an approach inspired by the Enhanced Topic-based Vector Space Model proposed by D. Kuropka“. A detailed description is given in the full article of moving from constructing a vector space to extracting a ranked list of topic identifiers for each programme.

The resulting classification was evaluated:

against 150 programmes that have been manually tagged in BBC Programmes and [we] found that the results, although by no means perfect, are good enough to efficiently bootstrap the tagging of a large collection of programmes.

The algorithm is apparently described in more detail in a paper accepted for Linked Data on the Web (LDOW2012), a workshop of the World Wide Web 2012 conference in Lyon 16th-20th April 2012. The post also discusses next steps for the work.

Source: Automatically tagging the World Service archive.

Christian Perfect March 24th, 2012

This is very interesting! I did a bit more digging about the Topic Vector Space Model.

I think this is the most relevant article on Kuropka’s site about the method: Topic-Based Vector Space Model. The “enhanced method” is what the BBC people used to automatically assign vectors to topics, but the only available writeup of that is a hundreds of pages-long analysis of data. The wikipedia page is middlingly enlightening.

The BBC people have put their code on Github, and included a pretty simple explanation of the algorithm in the README file. They say:

For each topic $t$ in the hierarchy, we consider the set of its parents $\operatorname{parents}(t, k)$ at a level $k$. We construct a vector for each $t$ in a space where each dimension corresponds to a topic $d$ in the hierarchy. The value of $t$ on dimension $d$ is defined as follows:

\[ t_d = \sum_{k = 0}^{\textrm{max_depth}} \sum_{d \in \operatorname{parents}(t, k)} \textrm{decay}^k \]

where $\textrm{max_depth}$ and $\textrm{decay}$ are two parameters, which can be used to influence how much importance we attach to ancestors that are high in the category hierarchy.

So each topic has a vector pointing towards its parent topics, with closer (more specific) topics weighed more heavily than more distant (broader) topics.

The dot product (called the cosine similarity in the BBC article for some reason) of two topics’ vectors is then a sort of measure of their similarity – if two topics are in the same sort of area, their vectors will point in roughly the same direction, so the dot product will be high. Similarly, if two topics have nothing in common, they will point in completely different (orthogonal, not opposite) directions, so the dot product of their vectors will be zero.

It’s interesting, reading the original paper and the BBC source code, how computer scientists take useful bits of pure maths but don’t quite get the terminology right, or rephrase things in a way that makes more sense to them.

5 Responses to “BBC R&D automatic tagging of speech audio using vector spaces”

Christian Perfect March 24th, 2012

This is very interesting! I did a bit more digging about the Topic Vector Space Model.

I think this is the most relevant article on Kuropka’s site about the method: Topic-Based Vector Space Model. The “enhanced method” is what the BBC people used to automatically assign vectors to topics, but the only available writeup of that is a hundreds of pages-long analysis of data. The wikipedia page is middlingly enlightening.

The BBC people have put their code on Github, and included a pretty simple explanation of the algorithm in the README file. They say:

For each topic $t$ in the hierarchy, we consider the set of its parents $\operatorname{parents}(t, k)$ at a level $k$. We construct a vector for each $t$ in a space where each dimension corresponds to a topic $d$ in the hierarchy. The value of $t$ on dimension $d$ is defined as follows:

\[ t_d = \sum_{k = 0}^{\textrm{max_depth}} \sum_{d \in \operatorname{parents}(t, k)} \textrm{decay}^k \]

where $\textrm{max_depth}$ and $\textrm{decay}$ are two parameters, which can be used to influence how much importance we attach to ancestors that are high in the category hierarchy.

So each topic has a vector pointing towards its parent topics, with closer (more specific) topics weighed more heavily than more distant (broader) topics.

The dot product (called the cosine similarity in the BBC article for some reason) of two topics’ vectors is then a sort of measure of their similarity – if two topics are in the same sort of area, their vectors will point in roughly the same direction, so the dot product will be high. Similarly, if two topics have nothing in common, they will point in completely different (orthogonal, not opposite) directions, so the dot product of their vectors will be zero.

It’s interesting, reading the original paper and the BBC source code, how computer scientists take useful bits of pure maths but don’t quite get the terminology right, or rephrase things in a way that makes more sense to them.

Log in to Reply
- Peter Rowlett March 24th, 2012
  
  Interesting. I’m always nervous of bringing too much of the article over because I want people to read the original but I think what you’ve done here – particularly translating from computer science to mathematics – is sufficiently different.
  
  Log in to Reply
Yves Raimond March 26th, 2012

Many thanks for the mention!

Another useful resource was Polyvyanyy’s thesis entitled “Evaluation of a Novel Information Retrieval Model: {eTVSM}”. The original description of that model is only available in German, it seems.

@Christian I am curious about your point about terminology – is there anything in particular you’re thinking about? ‘cosine similarity’ in this context is just a normalised dot product and is used a lot in IR?

Log in to Reply
- Christian Perfect March 27th, 2012
  
  I think “cosine similarity” was the only one in your post, but I was thinking of other CS papers I’ve read. Can’t bring any to mind right now.
  
  Log in to Reply

Pingback: Interesting Esoterica Summation, volume 3 | cp's mathem-o-blog

The Aperiodical

You're reading: News

BBC R&D automatic tagging of speech audio using vector spaces

5 Responses to “BBC R&D automatic tagging of speech audio using vector spaces”

Leave a Reply to Christian Perfect