During my summer internship at Novetta, I developed a prototype that can match a person’s face with the musician whom they most closely resemble, then play a song by that musician. This project was fun, but is also a great demonstration of a highly adaptable concept: face recognition from a live video feed triggering an action based on a specific identification.
Development
In planning this prototype, I knew I would need to leverage Amazon’s suite of developer applications. My background is in applied mathematics, so I needed to familiarize myself with that platform. Once I learned how the services communicated with each other, the project seemed straightforward. I decided to build the face recognition element of the prototype around Amazon Rekognition, a face recognition service which provides the functionality to enroll and search a collection of face images, then return possible matches based on comparison scores.
To begin, I sourced a list of musicians from Spotify’s Top 200 weekly songs over the past three years. I then made a web scraper to download 5 images of each unique artist for Amazon Rekognition to index. This collection of about 800 musicians became the basis for face comparison in the prototype.
Next, I set up a camera-enabled Raspberry Pi to send faces extracted from a video stream to an Amazon S3 bucket. I created an Alexa Skill to trigger an AWS Lambda Function in which a photo from that bucket is sent to Amazon Rekognition. There, the photo is compared to the collection and the musician whose photo generates the strongest comparison score is determined. That musician’s name and the Rekognition match-confidence percentage is returned to Alexa. Finally, Alexa plays a song by the matched musician. Input, comparison, match, action.
I began testing my prototype on other summer interns, and overall, I got the results I was looking for. There was just one problem – the prototype would not match my face with a musician. I explored the issue and discovered that all comparison scores associated with my face were lower than Amazon Rekognition’s default match threshold. In other words, my matches were not strong enough. Therefore, to return a match for my face, it was necessary to lower the match threshold.
In a real-world use case, we would not lower the threshold in order to force a match, but for our demonstration purposes I wanted to ensure that every face image returned a musician and a song. By lowering the threshold, the prototype found a match for me – in case you were wondering: Zayn Malik!
End-to-End Process
The following architecture diagram lays out the solution as a whole:
1. Alexa Command
A user triggers a custom Alexa Skill by asking what musician they look like.
2. Facial Recognition Request
Alexa initiates a request to Amazon Rekognition with an AWS Lambda Function.
3. Image Retrieval
Amazon Rekognition calls up an image, uploaded to an Amazon S3 bucket by a Raspberry Pi enabled camera.
4. Image Analysis
Amazon Rekognition compares the user’s image to the musician collection and finds the highest-scoring match.
5. Match Results
Through an AWS Lambda Function, Amazon Rekognition returns the name and confidence score of the matched musician in a readable format back to Alexa.
6. Song is Played by Alexa
Alexa tells the user their closest match and plays a song by that musician.
Wider Applications
This project was a fun foray into pop culture, but a similar architecture could easily be repurposed for many objectives. With an agile, inexpensive setup – a small lens, a Raspberry Pi, and a couple of lines of code – persons of interest could be identified, and alerts generated, in distributed applications.