
Unlock Peak Mobile Performance: A Deep Dive into PowerInfer-2's Neuron-Aware Runtime
26 Aug 2025
This deep dive explains PowerInfer-2's polymorphic engine, neuron cache, and fine-grained pipelining that make on-device LLM inference fast.

The Conductor in Your Pocket: How PowerInfer-2 Orchestrates Smartphone Hardware for LLM Inference
26 Aug 2025
PowerInfer-2 is a smartphone LLM inference framework that uses "neuron clusters" to optimize for heterogeneous hardware and minimize I/O overhead.

Why Your Phone's AI is Slow: A Story of Sparse Neurons and Finicky Flash Storage
26 Aug 2025
This analysis breaks down on-device LLM inference challenges, from compute stages to the unique performance quirks of smartphone storage.

PowerInfer-2 Achieves 29x Speedup, Running 47-Billion Parameter LLMs on Smartphones
26 Aug 2025
PowerInfer-2 runs massive LLMs (47B+) on smartphones at record speeds by optimizing for heterogeneous hardware and minimizing I/O overhead.

If GPT-OSS Weren’t OpenAI Models, Would We Still Care?
22 Aug 2025
Would anyone care about GPT-OSS if OpenAI didn’t make it? The models are decent, but the buzz is all branding. In 2025, who made it matters as much as what it d

Acknowledgements: Insights on vLLM Kernel from UC Berkeley
10 Jul 2025
We extend gratitude to Zhuohan Li, Simon Mo, and Kaichao You from UC Berkeley for their valuable insights that contributed to the vLLM kernel discussion

The Research Team Behind the phi-3 LLM Development
10 Jul 2025
Discover the names of the talented researchers and scientists who collaborated on the development and study of the phi-3 large language models.

Diverse Question Types in LLM Benchmark Prompts
10 Jul 2025
Explore a sample prompt featuring varied multiple-choice questions covering math, science, and humanities

References on Responsible AI, Long-Context, and Data-Optimal LLMs
9 Jul 2025
A compilation of cited works supporting our phi-3 research, including studies on responsible AI alignment, long-context models, data-optimal training strategies