Table of Links
-
TEXTGRAD: Optimizing AI systems by backpropagating text feedback
-
Results
3.2 Solution optimization by test-time training to improve problem solving
G. Treatment Plan Optimization
G Treatment Plan Optimization
G.1 Prompts
Radiotherapy treatment plan evaluation can based on various dimensions, therefore there is no single score that can indicate the quality of plans. We adopt LLM to compute the “loss” by prompting it to assess the plan quality with clinical protocols. Specifically, LLM is used to compare each protocol with the current plan and produce the final assessment.

G.2 Inner-loop optimization for treatment planning
We employ a two-loop optimization approach [109], which includes (i) an inner loop for inverse planning and (ii) an outer loop for optimizing the hyperparameters of the inner loop. The inner loop focuses on traditional fluence map optimization, seeking to determine the optimal fluence map x by minimizing a cost function that combines multiple weighted objectives for various targets and organs at risk. This cost function is defined as:

G.3 Additional Experimental Details
Dataset The dataset used in this study comprised imaging and treatment plans for 5 prostate cancer patients who underwent intensity-modulated radiation therapy (IMRT). Available data for each patient includes CT scans, delineated anatomical structures, and clinically approved treatment plans obtained via Eclipse®.
Method As we mentioned in 3.5, TEXTGRAD is used to optimize the hyperparameters (e.g., importance weights for PTV and OARs) of the inner-loop numerical optimizer that generates the treatment plan. This optimization is done using a variation of vanilla TEXTGRAD, i.e. “projected gradient descent with momentum updates”.In particular, three prostate cancer treatment plans optimized by clinicians, along with their corresponding hyperparameters, are provided. These examples guide the updates of the hyperparameters. This procedure can be viewed as an analogy to projection, as the updated hyperparameters are “softly projected” onto a feasible set defined by the three in-context examples. Moreover, the historical hyperparameters and the textual gradients from past iterations, as an analogy to momentum, are also included in the prompts for updating the hyperparameters. This additional context helps refine the optimization process. The optimization will be stopped if the loss suggests all protocols meet, other wise, it will be stopped if the maximum number of iterations (we set it to 10) is reached.
Initialization The hyperparameters i.e. the importance weights are all initialized at 100 for different organs. The dose objectives are set to 70.20 for PTV, 0.00 for bladder and rectum, and 30.00 for femoral heads and body, and fixed during optimization.
G.4 Additional Results
In Supplementary Table 1 and 2, we show additional results on comparing TEXTGRAD optimized plan with clinicians optimized plans.


Authors:
(1) Mert Yuksekgonul, Co-first author from Department of Computer Science, Stanford University ([email protected]);
(2) Federico Bianchi, Co-first author from Department of Computer Science, Stanford University ([email protected]);
(3) Joseph Boen, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);
(4) Sheng Liu, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);
(5) Zhi Huang, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);
(6) Carlos Guestrin, Department of Computer Science, Stanford University and Chan Zuckerberg Biohub ([email protected]);
(7) James Zou, Department of Computer Science, Stanford University, Department of Biomedical Data Science, Stanford University, and Chan Zuckerberg Biohub ([email protected]).
This paper is
