I'm working to make machines more helpful through unsupervised learning that scales.
I developed the Sparse Transformer with Scott Gray, and also coauthored work showing the emergent capabilities of large language models in a variety of settings (GPT-2, GPT-3, Image GPT, and more).
More recently I've worked on reducing the limitations of those techniques (very deep VAEs) while continuing to apply them on larger supercomputers (MT-NLG and PaLM).
I am half-Japanese, and my Japanese name is 石井興元. よろしくお願いします. I live in San Francisco.