Joseph Zhu

Linkedin Google Scholar Github Wechat josephz2000


I'm from Shanghai

I’m a second year Master’s student in the Computer Science department at Stanford University, specializing in AI & robotics. I briefly did some research in Stanford Vision and Learning Lab, before I decided to quit doing research for a while and build my own startup.

I got my Bachelor’s Degree in Electrical Engineering at University of Illiois at Urbana Champaign, where I was adviced by Prof. Mark Hasegawa-Johnson and Prof. Jont Allen, specializing in speech signal processing & machine learning. (Fun fact: it only took me 2.5 years to get that degree.)

Why I didn't do a PhD

Q: I was accepted to the PhD program at Berkely AI Reserach with a lot of stipend and all that, but I chose to do an MS instead. Why?

A: But Alas, a man's gotta eat. I want to grow skillsets more quickly by having a broad coverage in projects, rather than discover new grounds on a singular topic.

I write everything from low-level embedded robotics code,

For example, recently when working on the startup, I dabbled in stepper control stuff and modified the arduino AccelStepper library to compute stepper pulse intervals in real time based on high-frequency waypoint inputs on a mere Arduino nano(where I won't be able to use square root/division if I don't want to miss a pulse) by extending this paper.

fast solvers(using numerical integrators) for forward/inverse kinematics of parallel-linkage robots, custom AI architectures/math, AI deployment code, speech codecs based on Bell Lab papers from the 80s(finding docs for these is a pain), AI + speech signal processing stuff. I'm recently learning front-end and all that(by making this website).

I did my undergrad in EE, so I understand how analog & digital circuits work on a fundamental level(for example, how radio/electromagnetic waves work). Thanks to the coaching of Prof. Jont Allen, I got to learn about dynamical system, and how neurons work on the electrical level. I recently shifted my attention to the mechanical equivalent of these systems(fluid mechanics, deformable objects, etc).


Places I worked at

  1. Tesla Autopilot

    Quite interesting to work on, though I couldn't talk too much about it.
  2. Facebook

    By far the most comfy job I've had. Free lunches at 15 different canteens, bi-weekly chiropractor treatment, unlimited ergonomic keyboards and Herman Miller Aerons. They even had an ice-cream bar in office. True definition of a 9-5 job, pinnacle of The American Dream.
    I was working at Reality Lab's AI team. My job was to write a distillation frameworks for activity detection models running on AR glasses. Basically, the concept is that inside Facebook, we have the data & compute to train huge self-supervised "foundation models"(like VLP transformer, etc) using video-language-audio contrastive training and such such, but those models are too big to be deployed on the AR glasses. On the other hand, the "on-glass" model are small video processing neural networks(like MViT, X3D, etc), but those have lower accuracy if trained normally with supervised learning. My job was to write a framework where we can extract features & soft probability predictions from the big models, and train the smaller models using those. This is a huge performance booster, I was able to boost the mAP on a 100+ category video classification model from 0.32 to 0.49.
  3. Tencent

    Encoding human speech at 2kb/s bitrate to let you have meetings in an underground parking structure/elevator. Mingled a bunch of GAN(Generative Adversarial Networks) and signal compression stuff made in Bell Labs from the 80s. I forgot to save a recording when I left the job, this is something I managed to produce halfway through the internship.
  4. MIT

    In 2019 when I was a sophomore, I was really intrigued by neural networks(not the ones you train, I'm talking about the ones in the human brain) and the PDE(partial differential equation) that governs it, so I read this book. I wanted to do research on it, so I was planning on going to go to MIT's McGovern Institue to do some research in Summer 2020. However, Covid hit, and MIT completely stopped hiring staff, so I couldn't go there in person. I winded up doing some part-time remote research. I just stumbled upon the old code I wrote and decided to open source it. Maybe I'd be a neuroscience PhD by now if I was able to go there in person, who knows?
  5. Sensetime

  6. Capital One

    Did some NLP stuff, like sentiment analysis, in the anti-money-laundering department
  7. i-jet Lab

    Brunswick was my first job in America. I was getting paid $16/hour and at the time that was big money to me. I also for the first time got to work as an "engineer". I was learning things like object detection, semantic segmentation, path planning. I got the chance to work on a lot of cool projects. For one here, I am teaching a boat how to dock itself:

    I also made an imitation learning proof-of-concept with an omnidirectional robot for docking.
    I took care of a variety of other projects, like doing NLP stuff on 600k patents, write the trajectory for a drone to take videos, classifying satillite images to analyze our users, etc.


  1. See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
    Hao Li*, Yizhi Zhang*, Junzhe Zhu, Shaoxiong Wang, Michelle A. Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, and Jiajun Wu
    In The Conference on Robot Learning (CoRL), 2022
  2. Multi-Decoder DPRNN: Source Separation for Variable Number of Speakers
    Junzhe Zhu, Raymond A. Yeh, and Mark Hasegawa-Johnson
    In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
  3. A Comparison Study on Infant-Parent Voice Diarization
    Junzhe Zhu, Mark Hasegawa-Johnson, and Nancy McElwain
    In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
  4. Identify Speakers in Cocktail Parties with End-to-End Attention
    Junzhe Zhu, Mark Hasegawa-Johnson, and Leda Sari
    In Interspeech 2020: 21st Annual Conference of the International Speech Communication Association, 2020
  5. A Machine Learning Algorithm for Sorting Online Comments via Topic Modeling
    Junzhe Zhu, Elizabeth Wickes, and John R. Gallagher
    Commun. Des. Q. Rev, Jul 2021