How it works:
The NLP Conversational Part: (input my sentence via voice, output Pikachu’s response via voice).
First, it converts my voice to text. Then, puts the text input into a machine learning model to generate the Pikachu response. The model consists of an encoder → Context RNN → decoder.
The architecture was based off the open source Lukalabs cakechat. It uses a Hierarchical Recurrent Encoder-Decoder (HRED) architecture for dialog context. Has a multilayer recurrent neural network with gated recurrent units. Thought vector is fed into decoder on each decoding step.
It is was trained by me on thousands of my emails, my text messages, and various screenplays on google cloud and Azure free credits. When the model outputted the response, I would use text to speech to output the audio response.
The Robot Part: (How we got the physical Pikachu, made the mouth, and take Pikachu’s response and move the mouth accordingly, so it looks like he’s talking).
I designed my own 3D printed skull with a moving jaw. I put servos/motors controlling movement of the jaw, opening or closing it.
Cutting open the Pikachu, I unstuffed it+placed the skull on a rod inside the pikachu, and restuffed it.
How it would move the robot’s mouth at the right time:
The voice output’s audio stream splits. One audio split towards the speaker, the other sent the audio to an a custom amplitude analysis algorithm I wrote, using pulsewidth modulation. It distinguishes vowels and consonants, deciding when to open/close mouth.
What inspired you (or your team)?
The story behind it:
I had just finished my first machine learning course at the University of Toronto, and was talking with one of my friends. He could not stop gushing about the Detective Pikachu movie, pleading to me that he, too, wanted a pokemon best friend.
I remember standing there and saying, “bet — let’s make you one.”
“Haha”, he laughed, rolling his eyes.
My face remained blank.
A wave of understanding passed over him.
“Oh my god you’re not kidding”
The thought going through my mind was the robot Sophia. Sophia is a life-size conversational robot, created by Hanson Robotics, that just got citizenship to Saudi Arabia. I thought, if they can make a Sophia, I can make a DIY Pikachu.
I brainstormed, researched, 3D printed. I met with NLP experts at Google and Microsoft. I drowned in the library under research papers until I eventually built it.
How I got the robot:
To get the Pikachu, I went to a local carnival and essentially, humorously bribed the student behind the ring-toss booth with $30 for the huge stuffed animal Pikachu prize. Now I had to convert this giant stuffed animal into a robot.
Note: I got the chance to demo my project and the technical details to Geoff Hinton in Toronto. He said “I’m impressed.” That blew my mind and shattered my earth and now provides constant motivation that I can actually build important things.
Only at the end of my project did I get a chance to speak with Ben Goertzel, the maker of Sophia. He told me Sophia is not a conversational bot. You cannot have a conversation with Sophia! It’s all hard coded! All the interviews and conferences are scripted! The value of the bot comes from detailed facial expressions.
I realized then that what I built was entirely novel. No life-size robot has ever been equipped with real, conversational abilities.
I was naive and crazy to think I could build it, and I actually did. Holy moly.
Now, I’m putting Pikachai on wheels, so he can follow you around, and turn to face you.
Follow you around: Implementing the YOLO algorithm (Developed at MIT)
Turn to face you: facial recognition AWS.
I am in talks with senior homes, I want to try and make a best friend, that get’s better and more personalized over time, for those who are lonely.
If you have any questions, feel free to reach out!