TextGrad vs. DSPy & ProTeGi: Evolution of Textual Autograd

cover
13 Jun 2026

Abstract and 1. Introduction

  1. TEXTGRAD: Optimizing AI systems by backpropagating text feedback

  2. Results

    3.1 Code optimization

    3.2 Solution optimization by test-time training to improve problem solving

    3.3 Prompt optimization for reasoning

    3.4 Molecule optimization

    3.5 Radiotherapy treatment plan optimization

  3. Related work

  4. Discussion, Acknowledgements, and References

A. TEXTGRAD Details

B. Optimizer Extensions

C. Code Optimization

D. Solution Optimization

E. Prompt Optimization

F. Molecule Optimization

G. Treatment Plan Optimization

One related thread of work investigated the problem of prompt optimization. Practitioners demonstrated that prompt engineering strategies such as intelligently picking few-shot examples and in-context learning, CoT, ensembles can significantly boost performance of LLMs [66]. To automate this process, white-box methods that leverage numerical gradients were developed to optimize prompts [67–70], however, these methods cannot be used with closed-source models as they require access to model parameters. Various works investigated using LLMs as prompt optimizers [12, 25, 71].

Under prompt optimization, there are two works closest to our philosophy that have been our inspirations. First, DSPy [10, 72, 73] pioneered the idea of viewing complex LLM-based systems as programs with potentially many layers, and proposes ways to build and optimize them in a programmatic fashion. The framework is extensive, with results improving LLM performance in various question answering, reasoning, and prompt optimization tasks. Our work takes a different perspective that backpropagation and its extensions can be a general and powerful framework to optimize the new generation of AI systems, and perform multiple tasks outside of prompt optimization. In particular, we treat not only instructions or demonstrations as variables to optimize, but also the instances we care about themselves — such as molecules, treatment plans, code snippets, and so on. Second, greatly inspiring to us, Prompt Optimization with Textual Gradients (ProTeGi) [25] defines the Textual Gradients in the context of prompt optimization, where gradients are natural language feedback from LLMs given to the mistakes made during the task. While ProTeGi is built on the textual gradient analogy, we expand this analogy more broadly to automatic differentiation, and going substantially beyond prompt optimization tasks. In particular, both DSPy and ProTeGi focused on prompt optimization, while a significant advance of TEXTGRAD, as demonstrated through our diverse applications, is in instance optimization.

More generally, there is an emerging line of work built on the high-level idea of using LLMs as critics or optimizers [10, 12, 25, 26, 30, 71, 74–80]. While many of these earlier frameworks demonstrated the utility of LLMs as optimizers, we propose a single and general framework that was tested successfully in a variety of applications. Within this framework, we can reason about optimizing chains or stacks of LLMs [81–83]: we propagate natural language feedback. Similarly, once viewed as a general-purpose optimization engine, we can formulate many relevant problems instantiated as a few lines of code in our framework, such as testtime training [42, 43] or self-refinement of solutions and self-improvement [26, 30, 44, 84–91]. Building on the optimization analogy, we already transferred several analogies from the traditional optimization literature such as momentum [40] through using earlier iterations in the context, use of batch optimization [92], constrained optimization [35] using natural language constraints, and so on. Our work opens up a large space to design the new generation of optimization algorithms, all within the same framework.

Authors:

(1) Mert Yuksekgonul, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(2) Federico Bianchi, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(3) Joseph Boen, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(4) Sheng Liu, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(5) Zhi Huang, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(6) Carlos Guestrin, Department of Computer Science, Stanford University and Chan Zuckerberg Biohub ([email protected]);

(7) James Zou, Department of Computer Science, Stanford University, Department of Biomedical Data Science, Stanford University, and Chan Zuckerberg Biohub ([email protected]).


This paper is available on arxiv under CC BY 4.0 license.