Joseph Zhu
I'm from Shanghai
I’m a second year Master’s student in the Computer Science department at Stanford University, specializing in AI & robotics. I briefly did some research in Stanford Vision and Learning Lab, before I decided to quit doing research for a while and build my own startup.
I got my Bachelor’s Degree in Electrical Engineering at University of Illiois at Urbana Champaign, where I was adviced by Prof. Mark Hasegawa-Johnson and Prof. Jont Allen, specializing in speech signal processing & machine learning. (Fun fact: it only took me 2.5 years to get that degree.)
Why I didn't do a PhD
Q: I was accepted to the PhD program at Berkely AI Reserach with a lot of stipend and all that, but I chose to do an MS instead. Why?
A: But Alas, a man's gotta eat. I want to grow skillsets more quickly by having a broad coverage in projects, rather than discover new grounds on a singular topic.
I write everything from low-level embedded robotics code,
For example, recently when working on the startup, I dabbled in stepper control stuff and modified the arduino AccelStepper library to compute stepper pulse intervals in real time based on high-frequency waypoint inputs on a mere Arduino nano(where I won't be able to use square root/division if I don't want to miss a pulse) by extending this paper.
fast solvers(using numerical integrators) for forward/inverse kinematics of parallel-linkage robots, custom AI architectures/math, AI deployment code, speech codecs based on Bell Lab papers from the 80s(finding docs for these is a pain), AI + speech signal processing stuff. I'm recently learning front-end and all that(by making this website).
I did my undergrad in EE, so I understand how analog & digital circuits work on a fundamental level(for example, how radio/electromagnetic waves work). Thanks to the coaching of Prof. Jont Allen, I got to learn about dynamical system, and how neurons work on the electrical level. I recently shifted my attention to the mechanical equivalent of these systems(fluid mechanics, deformable objects, etc).
Education
Places I worked at
-
Tesla Autopilot
Quite interesting to work on, though I couldn't talk too much about it. -
Facebook
By far the most comfy job I've had. Free lunches at 15 different canteens, bi-weekly chiropractor treatment, unlimited ergonomic keyboards and Herman Miller Aerons. They even had an ice-cream bar in office. True definition of a 9-5 job, pinnacle of The American Dream.
I was working at Reality Lab's AI team. My job was to write a distillation frameworks for activity detection models running on AR glasses. Basically, the concept is that inside Facebook, we have the data & compute to train huge self-supervised "foundation models"(like VLP transformer, etc) using video-language-audio contrastive training and such such, but those models are too big to be deployed on the AR glasses. On the other hand, the "on-glass" model are small video processing neural networks(like MViT, X3D, etc), but those have lower accuracy if trained normally with supervised learning. My job was to write a framework where we can extract features & soft probability predictions from the big models, and train the smaller models using those. This is a huge performance booster, I was able to boost the mAP on a 100+ category video classification model from 0.32 to 0.49. -
Tencent
Encoding human speech at 2kb/s bitrate to let you have meetings in an underground parking structure/elevator. Mingled a bunch of GAN(Generative Adversarial Networks) and signal compression stuff made in Bell Labs from the 80s. I forgot to save a recording when I left the job, this is something I managed to produce halfway through the internship.
-
MIT
In 2019 when I was a sophomore, I was really intrigued by neural networks(not the ones you train, I'm talking about the ones in the human brain) and the PDE(partial differential equation) that governs it, so I read this book. I wanted to do research on it, so I was planning on going to go to MIT's McGovern Institue to do some research in Summer 2020. However, Covid hit, and MIT completely stopped hiring staff, so I couldn't go there in person. I winded up doing some part-time remote research. I just stumbled upon the old code I wrote and decided to open source it. Maybe I'd be a neuroscience PhD by now if I was able to go there in person, who knows? -
Sensetime
-
Capital One
Did some NLP stuff, like sentiment analysis, in the anti-money-laundering department -
i-jet Lab
Brunswick was my first job in America. I was getting paid $16/hour and at the time that was big money to me. I also for the first time got to work as an "engineer". I was learning things like object detection, semantic segmentation, path planning. I got the chance to work on a lot of cool projects. For one here, I am teaching a boat how to dock itself:
I also made an imitation learning proof-of-concept with an omnidirectional robot for docking.
I took care of a variety of other projects, like doing NLP stuff on 600k patents, write the trajectory for a drone to take videos, classifying satillite images to analyze our users, etc.
Publications
- See, Hear, and Feel: Smart Sensory Fusion for Robotic ManipulationIn The Conference on Robot Learning (CoRL), 2022
- Multi-Decoder DPRNN: Source Separation for Variable Number of SpeakersIn ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021