About
I'm a researcher at NVIDIA that develops algorithms that process and generate information in both speech and text modalities.
I used to work as an applied scientist in Amazon AWS AI Labs, where I spent two years working on products such as Amazon Translate, Amazon Q for Business, and Amazon Titan Text Embeddings. Prior to Amazon, I was a PhD student at Department of Computer Science, Johns Hopkins University. My PhD advisor was Philipp Koehn.
During my graduate studies, I was primarily affiliated with Center for Language and Speech Processing and also did research on neural machine translation. I had also spent a few memorable months either interning at Microsoft Translator, Salesforce Research, Amazon, or visiting The University of Edinburgh. Before joining Johns Hopkins, I got my Bachelor's degree in Beijing University of Posts & Telecommunications. During the last year of my undergraduate study, I worked with Weiwei Sun in the Language Computing and Web Mining Group of Institute of Computer Science & Technology, Peking University, with a focus on semantic parsing and Chinese word segmentation.
Publications
EMMeTT: Efficient Multimodal Machine Translation Training
Piotr Żelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg (ICASSP 2025) [pdf]Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains
Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, Brian Thompson
Annual Meeting of the Association for Computational Linguistics (ACL 2024) [pdf]Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation
Weiting Tan, Shuoyang Ding, Huda Khayrallah, Philipp Koehn
The 15th Conference of the Association for Machine Translation in the Americas (AMTA 2022) [pdf]Runtime Audit of Neural Sequence Models for NLP
Shuoyang Ding
PhD Thesis [pdf]The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task
Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Christian Federmann, Philipp Koehn
Sixth Conference on Machine Translation (WMT) 2021 [pdf][poster]Levenshtein Training for Word-level Quality Estimation
Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn
EMNLP 2021 [pdf][code][slides][poster][talk]Evaluating Saliency Methods for Neural Language Models
Shuoyang Ding, Philipp Koehn
NAACL 2021 [pdf][code][slides][talk]Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur
ASRU 2019 [pdf][code]A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation
Shuoyang Ding, Adithya Renduchintala, Kevin Duh
MT Summit 2019 [pdf][code][poster]An Exploration of Masking for Neural Machine Translation
Matt Post, Shuoyang Ding, Marianna Martindale and Winston Wu
MT Summit 2019 [pdf]Saliency-driven Word Alignment Interpretation for Neural Machine Translation
Shuoyang Ding, Hainan Xu, Philipp Koehn
Fourth Conference on Machine Translation (WMT) 2019 [pdf][code][slides]Parallelizable Stack Long Short-Term Memory
Shuoyang Ding, Philipp Koehn
NAACL 2019 Workshop on Structured Prediction for NLP [pdf][bib][code][slides]Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling
Hainan Xu, Shuoyang Ding, Shinji Watanabe
ICASSP 2019 [pdf][bib][code]Multi-Modal Data Augmentation for End-to-end ASR
Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe
Interspeech 2018 Best Student Paper Award (3/700+) [pdf][bib]The JHU Machine Translation Systems for WMT 2017
Shuoyang Ding, Huda Khayrallah, Philipp Koehn, Matt Post, Gaurav Kumar, and Kevin Duh
Second Conference on Machine Translation (WMT) 2017 [pdf][bib]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn, and Matt Post
First Conference on Machine Translation (WMT) 2016 [pdf][bib]Grammatical Relations in Chinese: GB-Ground Extraction and Data-Driven Parsing
Weiwei Sun, Yantao Du, Xin Kou, Shuoyang Ding, Xiaojun Wan
Annual Meeting of the Association for Computational Linguistics (ACL) 2014 [pdf][bib]
Preprints
How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?
Shuoyang Ding and Kevin Duh, 2018 [pdf]Backstitch: Counteracting Finite-sample Bias via Negative Steps
Yiming Wang, Hossein Hadian, Shuoyang Ding, Ke Li, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur, 2017 [pdf]
Courses
- 600.465: Natural Language Processing
- 600.475: Machine Learning
- 600.468: Machine Translation
- 600.676: Machine Learning: Data to Models
- 050.620: Syntax I
- 600.615: Big Data, Small Languages, Scalable Systems
- 550.661: Nonlinear Optimization I
- 600.420: Parallel Programming
Teaching
- Nov 2021: Guest Lecture, EN.600.468/601.668 Machine Translation -- Analysis and Visualization
- Fall 2017: Graduate Teaching Assistant, EN.600.468/601.668 Machine Translation. Checkout the neural network and NMT homework I designed.
- Spring 2017: Guest Lecture, EN.600.435 Artificial Intelligence -- Markov Decision Process
- Spring 2016: Guest Lecture, EN.600.468 Machine Translation -- Syntax-Based Models