TextGrad Autograd Engine: Variables, Roles, & PyTorch-Style Textual Backpropagation

14 Jun 2026

Table of Links

Abstract and 1. Introduction

TEXTGRAD: Optimizing AI systems by backpropagating text feedback
Results

3.1 Code optimization

3.2 Solution optimization by test-time training to improve problem solving

3.3 Prompt optimization for reasoning

3.4 Molecule optimization

3.5 Radiotherapy treatment plan optimization
Related work
Discussion, Acknowledgements, and References

A. TEXTGRAD Details

B. Optimizer Extensions

C. Code Optimization

D. Solution Optimization

E. Prompt Optimization

F. Molecule Optimization

G. Treatment Plan Optimization

A TEXTGRAD Details

A.1 Variables

Variables are the nodes in the computation graph. Below are the most important attributes of Variables:

Value is the unstructured data that the variable contains. Throughout this manuscript, all values are text data.
Role description is an informative string that describes the role of the variable in the computation graph. We use these roles to let the user inject knowledge into the graph and guide the optimization behavior. More information is described below.
Gradients are the natural language feedback provided by the LLMs during the backward pass. Thesedescribe the changes to make the variable so that the downstream loss can be improved. For an example backward operation that populates gradients, please read Section A.3.
Predecessors are the set of variables that are used to generate a given variable. For instance, if we are giving an instruction to an LLM and getting a response, the instruction would be the predecessor of the response. During the backward pass, the gradients on the successor are passed to its predecessors, to provide guidance around how to improve the downstream objective.
Requires grad indicates whether or not the gradients will be populated during the backward pass, analogous to PyTorch. For instance, if the user does not wish to compute gradients for a question, then simply write Variable(value=question, requires_grad=False, ...) to indicate this.

Role Description: In TEXTGRAD, each variable has a role description. In particular, this is a string that describes the role of the variable in the computation graph, such as system prompt to the language model or prediction by the language model. While sometimes populated automatically, in general role descriptions are one of the primary ways to inject user knowledge into the optimization process.

Empirically, we find that role descriptions can significantly steer the optimization process. For instance, setting the role of a prediction to be the final numerical answer to the language model guides the Textual Gradient Descent optimizer, that prompts a language model to update the value of the variable using the feedback, to update the variable such that it is only a numerical value. In comparison, a role description such as the reasoning for the solution and the final prediction guides the optimizer to produce the reasoning along with the final prediction

Here is an example usage:

Code Snippet 3: An example usage of a role description.

A.2 Backpropagation

The backpropagation algorithm in TEXTGRAD mirrors existing autograd frameworks for deep learning. See Algorithm 1 for the pseudocode.

A.3 Functions

TEXTGRAD offers several operations where both the forward and backward computations are defined – as such, these operations are composable. The abstract textgrad.autograd.Function class has two methods:

forward and backward, mirroring the PyTorch syntax. Each function has to define both of these methods. Below, we describe a couple of the most used operations in this paper.

LLMCall Function: Currently, the most crucial operation in TEXTGRAD is the call to language models.

Forward mode. The forward mode is simple: We make a call to an LLM, through an API or through the local machine. When a call is made, all the input variables are registered as the predecessors of the response from the LLM. For instance, if we ask a question to an LLM using an instruction and a question variable, the response variable’s predecessors will be [instruction, question]. When doing the backward pass, the gradients on the response will be backpropagated to question and instruction.

Backward mode. To ensure the backward function runs smoothly and generally, we add the following glossary to the system prompt:

Using these tags, the LLM is made aware of the objective, the role and the value of the variable to give feedback to, and the full conversation in the forward mode.

This glossary is provided in the system prompt to the backward engine LLM:

Most of this setup aims to ensure that the user does not have to modify the gradient computation. All of our experiments in the diverse set of applications are done with the same backward mode.

For instance, an example prompt for the gradient computation looks like the following:.

Addition Operation In numerical optimization, we have the following:

In numerical derivatives, due to the linearity of addition, we have:

In particular, the backward function for the addition operation passes the gradients on the output of the addition operation to its inputs.

Similarly, in TEXTGRAD, we have the tg.sum operation, that lets the gradients (feedback) on the output variable pass to the input variables.

where we use + to indicate concatenation.

Use Case: One canonical use case for the addition operation is the batch optimization case. In particular, we implement minibatch gradient descent when performing prompt optimization (Section 3.3).

A.4 Textual Gradient Descent Implementation

Similar to backward computations, we strive to preserve generality in the implementation of TGD. In particular, we use the same glossary of tags provided above to inject information to the optimization process.

The current system prompt to the optimizer call is the following:

Below is an example prompt to the optimizer:

Authors:

(1) Mert Yuksekgonul, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(2) Federico Bianchi, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(3) Joseph Boen, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(4) Sheng Liu, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(5) Zhi Huang, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(6) Carlos Guestrin, Department of Computer Science, Stanford University and Chan Zuckerberg Biohub ([email protected]);

(7) James Zou, Department of Computer Science, Stanford University, Department of Biomedical Data Science, Stanford University, and Chan Zuckerberg Biohub ([email protected]).

This paper is available on arxiv under CC BY 4.0 license.

← Previous

TextGrad Framework: The Future of Compound AI Optimization

Up Next →

TextGrad Optimization Extensions: Momentum, Constraints, & Batching