TextGrad Optimization Extensions: Momentum, Constraints, & Batching

17 Jun 2026

Table of Links

Abstract and 1. Introduction

TEXTGRAD: Optimizing AI systems by backpropagating text feedback
Results

3.1 Code optimization

3.2 Solution optimization by test-time training to improve problem solving

3.3 Prompt optimization for reasoning

3.4 Molecule optimization

3.5 Radiotherapy treatment plan optimization
Related work
Discussion, Acknowledgements, and References

A. TEXTGRAD Details

B. Optimizer Extensions

C. Code Optimization

D. Solution Optimization

E. Prompt Optimization

F. Molecule Optimization

G. Treatment Plan Optimization

B Optimizer Extensions

Batch Optimization

In batch optimization, we use the tg.sum function described above. In particular, gradients propagating from multiple instances are concatenated together, thus the optimizer sees all of the feedback to a variable coming from multiple sources.

The syntax is as simple as the following:

Code Snippet 4: An example use for batch optimization for question answering.

Constrained Optimization with Natural Language Constraints

In TEXTGRAD it is possible use constraints when optimizing variables. These constraints are all defined as natural language descriptions. For example, one can prompt optimizer to update the variable but to conclude its response with an answer during the update:

In general, the constraint post-fix is appended to the optimizer’s prompt, where the constraints are written within the {constraint text} <CONSTRAINTS>tags.

In code, the user can simply pass in the constraints to the TGD optimizer:

Code Snippet 5: An example use for constraints when updating the solution to a problem.

Momentum

TEXTGRAD supports the use of Momentum in the Textual Gradient Descent. In standard SGD momentum uses a linear combination of past gradients and the most recent one to define a new gradient to update a variable. Similarly, TEXTGRAD keeps track of past iterations of the variable. This postfix is appended to the prompt for the optimizer.

Code Snippet 6: How to enable momentum using 3 previous steps in the TextualGradientDescent optimizer.

In-Context Examples

In-context examples can be utilized to improve textual gradients and update variables effectively. These examples serve as references to illustrate the characteristics of optimized variables. When in-context examples are applied, TEXTGRAD adopts the following prompt to incorporate them:

By leveraging these examples, TEXTGRAD can better understand and implement the properties of optimized variables, enhancing the overall optimization process

Authors:

(1) Mert Yuksekgonul, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(2) Federico Bianchi, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(3) Joseph Boen, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(4) Sheng Liu, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(5) Zhi Huang, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(6) Carlos Guestrin, Department of Computer Science, Stanford University and Chan Zuckerberg Biohub ([email protected]);

(7) James Zou, Department of Computer Science, Stanford University, Department of Biomedical Data Science, Stanford University, and Chan Zuckerberg Biohub ([email protected]).

This paper is available on arxiv under CC BY 4.0 license.

← Previous

TextGrad Autograd Engine: Variables, Roles, & PyTorch-Style Textual Backpropagation

Up Next →

Benchmarking TextGrad: Automated Code Optimization on LeetCode Hard