This Spring, Novetta sponsored a Diversity Fellowship at the University of San Francisco’s Data Institute for the Deep Learning Part II course led by Jeremy Howard and Rachel Thomas, the co-founders of fast.ai. We were happy to support Shawn Tsosie, a veteran who loves deep learning and fast.ai as much as we do. You can hear from him below in his own words describing why he took the course and what he hopes to do with what he learned.
My wife, Claire, had gone into labor with our first child the day this course began. I planned on missing the first class, but Claire told me that labor is often a long process and that the course was important to our future.
So, I went.
The course was amazing and exactly what I was looking for. Jeremy Howard planned on going through the foundations of Deep Learning from scratch. After the first class concluded and I was thinking over the lesson, Claire texted me and told me to get home immediately.
When I arrived home, we grabbed everything and drove to the hospital. Three and a half hours later, our daughter was born.
This course will always be special to me. The first weeks of my daughter’s life were shared with the first weeks of my Deep Learning career.
A Non-Traditional Path
My journey toward becoming a student of deep learning has been long, zig-zagging, and non-traditional.
I’m a Navajo who grew up in Harlem, Montana, just outside of the Fort Belknap Indian Reservation, home of the Assiniboine and Gros Ventre tribes. After graduating high school, I attended Massachusetts Institute of Technology, but failed out. I then joined the US Army as an airborne infantryman and served two deployments in support of Operation Iraqi Freedom. After being honorably discharged, I returned to MIT and graduated with a degree in mathematics. I did well enough to attend a Ph.D. program in mathematics at the University of California at Santa Cruz, which I graduated from in 2018. From there, I decided to transition to deep learning.
Deep Learning, From a New Angle
As a non-traditional student of deep learning, I enjoyed how this course approached deep learning topics in non-traditional ways. For me, it demonstrated the value of thinking outside the box and approaching problems using less obvious methods.
For example, I enjoyed Jeremy’s departure from the PEP 8 style guide. He used the following code to update the gradients in a simple training loop:
l.weight -= l.weights.grad * lr
l.bias -= l.bias.grad * lr
While non-traditional, at least to me, it seems to be much more readable code and makes it clearer what is going on.
The use of callbacks is also exciting to me. I’m working on understanding how to implement them, so I have the ability to rapidly prototype ideas without a large amount of refactoring.
I’m trying to do all of the coursework recommended. Currently, I’m working on creating a neural network framework from scratch, using only torch tensors from the Pytorch framework. The ultimate goal is to create a ResNet with some state of the art optimizers, initializations, and layers. With the knowledge I gained from this course, I feel confident about completing this project.
An Interesting Paper: The Shattered Gradients Problem
In the course Deep Learning Part II, I gained a better understanding of the challenges in training deep neural networks, how to read research papers about deep neural networks, and how to implement them in code.
One paper in particular that was fascinating to me was The Shattered Gradients Problem: If ResNets are the answer, then what is the question? by David Balduzzi, et al. Residual networks (ResNets) have allowed the number of layers in a network to increase significantly. The question is: why are deep ResNets more effective than other architectures? In trying to answer this question, the authors discovered what they call the shattered gradients problem.
In deep neural networks, it was recognized by Kaiming He et al., in their paper Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, that if a neural network was not properly initialized, then the size of the gradients would either explode or vanish. This stops the training of the neural network. However, in addition to the vanishing and exploding gradient problems, the shattered gradient problem also affects training.
The shattered gradient problem is the tendency for gradients to resemble white noise, sampled from a normal distribution. When this happens, the gradient will randomly update, instead of in an optimal direction. The authors showed that ResNets have gradients that approach white noise more slowly, thus allowing the gradients to more effectively train.
Solving Unsolved Problems
I have always loved solving problems. It’s what drew me to mathematics and a Ph.D. It’s what draws me to deep learning. A key component of solving problems, though, is having a strong understanding of basic tools. Deep Learning Part II helped me build up and understand the basic tools of deep learning. Most importantly, it helped me learn how to read academic papers on deep learning and then implement them. My future goals contain problems that have not been solved yet, so this ability is necessary as I look to solve them.
In the longer term, I’m interested in using deep learning techniques to help preserve indigenous languages. The cultures of Native American tribes and First Nations have been devastated over the past several hundred years. The cultural traditions are in danger of being lost – in particular, the languages.
In many languages, several generations have gone by without learning the language, so they are in danger of ceasing to be living languages. Part of this desire for preservation stems from my own inability to speak Navajo and my desire for my children to speak it.
My ultimate goal would be to develop software and models that could be controlled by each individual tribe. My hope is that these models and software will be used not only to preserve the languages, but to revitalize them.
There are several steps that I currently envision to accomplish this. First, the creation of a speech-to-text model. This first-step model will be used to ease the burden of transcription. If it goes according to plan, audio will be transcribed more quickly. This transcribed audio will be used to further improve the model.
Second, I envision moving towards a speech to speech model. This speech to speech model, if successful, could be incorporated into other software to aid in teaching.
I would like to document and refine the process so that I can apply the methods to help revitalize other indigenous languages. I would also like the project to be open source, so that others can use what was learned for other languages.
Ultimately, it would be something similar to Google Translate. The main challenge for this project is the small training sets that are available, but I see this difficulty as a challenge and an opportunity.