cover

Unlock Peak Mobile Performance: A Deep Dive into PowerInfer-2's Neuron-Aware Runtime

26 Aug 2025

This deep dive explains PowerInfer-2's polymorphic engine, neuron cache, and fine-grained pipelining that make on-device LLM inference fast.

cover

The Conductor in Your Pocket: How PowerInfer-2 Orchestrates Smartphone Hardware for LLM Inference

26 Aug 2025

PowerInfer-2 is a smartphone LLM inference framework that uses "neuron clusters" to optimize for heterogeneous hardware and minimize I/O overhead.

cover

Why Your Phone's AI is Slow: A Story of Sparse Neurons and Finicky Flash Storage

26 Aug 2025

This analysis breaks down on-device LLM inference challenges, from compute stages to the unique performance quirks of smartphone storage.

cover

PowerInfer-2 Achieves 29x Speedup, Running 47-Billion Parameter LLMs on Smartphones

26 Aug 2025

PowerInfer-2 runs massive LLMs (47B+) on smartphones at record speeds by optimizing for heterogeneous hardware and minimizing I/O overhead.

cover

If GPT-OSS Weren’t OpenAI Models, Would We Still Care?

22 Aug 2025

Would anyone care about GPT-OSS if OpenAI didn’t make it? The models are decent, but the buzz is all branding. In 2025, who made it matters as much as what it d

cover

Acknowledgements: Insights on vLLM Kernel from UC Berkeley

10 Jul 2025

We extend gratitude to Zhuohan Li, Simon Mo, and Kaichao You from UC Berkeley for their valuable insights that contributed to the vLLM kernel discussion

cover

The Research Team Behind the phi-3 LLM Development

10 Jul 2025

Discover the names of the talented researchers and scientists who collaborated on the development and study of the phi-3 large language models.

cover

Diverse Question Types in LLM Benchmark Prompts

10 Jul 2025

Explore a sample prompt featuring varied multiple-choice questions covering math, science, and humanities

cover

References on Responsible AI, Long-Context, and Data-Optimal LLMs

9 Jul 2025

A compilation of cited works supporting our phi-3 research, including studies on responsible AI alignment, long-context models, data-optimal training strategies