SeQuant

For Symbolics and Numerics in Electronic Structure

by Bimal Gaudel


The Holy Grail: Predictive Accuracy

We want to predict

  • structure
  • spectra
  • reactivity

Many-body methods (CC, CI, PT, etc.)

  • systematically improvable approximations
  • expensive to apply for large systems

Want predictive accuracy on large systems: proteins, complexes, etc.

Compute and storage cost scaling (CC methods)
Method Compute Storage
CCSD \(N^6\) \(N^4\)
CCSD(T) \(N^7\) \(N^4\)
CCSDT \(N^8\) \(N^6\)
CCSDTQ \(N^{10}\) \(N^8\)

HF and CCSD. Image courtesy of Edward F. Valeev

HF and CCSD. Image courtesy of Edward F. Valeev

Theoretical Solutions Exist

We know how to break the scaling wall.

Pavošević et al. (J. Chem. Phys., 2017)

CCSD(T) \(\to\) DLPNO-CCSD(T)

\(\mathcal{O}(N^7)\) \(\to\) \(\mathcal{O}(N)\)

Standard vs Reduced-Complexity Coupled Cluster

\[e^{- \opr{T}}\opr{H}e^{\opr{T}}\ket{\Phi} = E\ket{\Phi}\]

\[\opr{T} = \sum_{μ} t_μ \opr{τ}_μ\]

Standard virtuals

\[\opr{τ} \equiv a_{i_1\dots i_k}^\alert{a_1\dots a_k} \equiv a_{a_1}^† \dots a_{a_k}^† a_{i_k}\dots a_{i_1}\]

\[\begin{aligned} R_{a_1 a_2}^{i_1 i_2} = g_{a_1 a_2}^{i_1 i_2} + f_{a_1}^{a_3} t_{a_3 a_2}^{i_1 i_2} + \ldots \end{aligned}\]

Cluster-specific-virtuals

\[\opr{τ} \equiv a_{i_1 \dots i_k}^\alert{{a_1^{i_1 \dots i_k}}\dots{a_k^{i_1 \dots i_k}}} \equiv a^†_{a_1^{i_1 \dots i_k }} \dots a^†_{a_k^{i_1 \dots i_k}} a_{i_k}\dots a_{i_1}\]

\[\begin{aligned} R_{a_1^{i_1 i_2} a_2^{i_1 i_2}}^{i_1 i_2} = g_{a_1^{i_1 i_2} a_2^{i_1 i_2}}^{i_1 i_2} + f_{a_1^{i_1 i_2}}^{a_3^{i_1 i_2}} t_{a_3^{i_1 i_2} a_2^{i_1 i_2}}^{i_1 i_2} + \ldots \end{aligned}\]

Standard Virtuals CC

\[e^\opr{T}\ket{0} =\]

Cluster Specific Virtuals (CSV) CC

\[e^\opr{T}\ket{0} =\]

Conventional vs PNO Formulation

MP2

PNO-MP2

\[\begin{aligned} R_{a_1 a_2}^{i_1 i_2} = &g_{a_1 a_2}^{i_1 i_2} + f_{a_1}^{a_3} t_{a_3 a_2}^{i_1 i_2} \\ &+ f_{a_2}^{a_3} t_{a_1 a_3}^{i_1 i_2} - f_{i_3}^{i_1} t_{a_1 a_2}^{i_3 i_2} - f_{i_3}^{i_2} t_{a_1 a_2}^{i_1 i_3} \end{aligned}\]

\[\begin{aligned} R_{a_1^{i_1 i_2} a_2^{i_1 i_2}}^{i_1 i_2} = &g_{a_1^{i_1 i_2} a_2^{i_1 i_2}}^{i_1 i_2} + f_{a_1^{i_1 i_2}}^{a_3^{i_1 i_2}} t_{a_3^{i_1 i_2} a_2^{i_1 i_2}}^{i_1 i_2} + f_{a_2^{i_1 i_2}}^{a_3^{i_1 i_2}} t_{a_1^{i_1 i_2} a_3^{i_1 i_2}}^{i_1 i_2} \\ &- f_{i_3}^{i_1} t_{a_1^{i_3 i_2} a_2^{i_3 i_2}}^{i_3 i_2} s_{a_1^{i_1 i_2}}^{a_1^{i_3 i_2}} s_{a_2^{i_1 i_2}}^{a_2^{i_3 i_2}} - f_{i_3}^{i_2} t_{a_1^{i_1 i_3} a_2^{i_1 i_3}}^{i_1 i_3} s_{a_1^{i_1 i_2}}^{a_1^{i_1 i_3}} s_{a_2^{i_1 i_2}}^{a_2^{i_1 i_3}} \end{aligned}\]

  • familiar algebra of dense/block-sparse tensors with independent dimensions
  • covariant algebra \(\to\) affine loop nests, familiar optimization problems
  • reduces to dense linear algebra kernels with high FLOP/MOP ratio
  • algebra of block-sparse/element-sparse tensors with dependent dimensions
  • non-covariant algebra \(\to\) non-affine loop nests
  • reduces to dense linear algebra kernels with low FLOP/MOP ratio

Old Problems Renewed in CSV-CC

The Problem of Tensor Network Canonicalization

Consider this tensor network

\[{\overline{g}}_{\alertPink{i_3} i_4}^{\alertPink{a_3} a_4}t_{a_1 \alertPink{a_3}}^{i_1 i_2}t_{a_2 a_4}^{\alertPink{i_3} i_4}\]

  • \(t\) tensors are identical (modulo index labeling)

Then we ask: is the following equivalent to the above?

\[{\overline{g}}_{i_4 \alertPink{i_3}}^{a_4 \alertPink{a_3}}t_{a_1 a_4}^{i_1 i_2}t_{a_2 \alertPink{a_3}}^{i_4 \alertPink{i_3}}\]

Turns out yes: just by relabeling the dummy indices \(i_3 \leftrightarrow i_4\) and \(a_3 \leftrightarrow a_4\)

Now consider a pair of tensors with dependent indices

\[\mathbf{T}_1 = \overline{g}_{i_2 a_1^\alertPink{i_1}}^{\alertTeal{a_2}^\alertOrange{i_3} a_3^{i_4}} \text{ and } \mathbf{T}_2 = \overline{g}_{i_2 a_1^{i_4}}^{a_3^\alertPink{i_1} \alertTeal{a_2}^\alertOrange{i_3}}\]

Turns out they are equivalent modulo a phase (\(- 1\)) and index relabeling (\(i_1 \leftrightarrow i_4\)).

\[\mathbf{T}_1 \equiv -1 \times \mathbf{T}_2\left.\right|_{i_1 \leftrightarrow i_4} = - \overline{g}_{i_2 a_1^{i_1}}^{a_3^\alertPink{i_4} \alertTeal{a_2}^\alertOrange{i_3}}\]

Not obvious as in the previous example.

Antisymmetry property of tensor \(\overline{g}\) was used.

Weak algorithms can lead to easy mistakes:

Different but might appear equivalent:

\[g_{i_2 a_1^{i_1}}^{\alertPink{i_3} \alertTeal{a_2^{i_4}}} \qquad g_{i_2 a_1^{i_1}}^{\alertTeal{a_2^{i_4}} \alertPink{i_3}}\]

Equivalent but might appear different:

\[g_{i_2 i_3}^{a_2^{ \alertPink{i_4} \alertTeal{i_5} } a_3^{ \alertPink{i_4} \alertTeal{i_5} }} s_{ \alertPink{i_4} }^{i_2} \qquad g_{i_2 i_3}^{a_2^{ \alertPink{i_4} \alertTeal{i_5} }a_3^{ \alertPink{i_4} \alertTeal{i_5} }} s_{ \alertTeal{i_5} }^{i_2}\]

SeQuant’s Solution: Color Graph Canonicalization

  • Map the tensors networks to color graphs.
  • If the graphs are isomorphic, the tensor networks are identical.

\[B_{a_1^{i_1 i_2}}^{i_1} [p_1] B_{a_2^{i_1 i_2}}^{i_2} [p_1] \tilde{a}_{i_1 i_2}^{a_1^{i_1 i_2} a_2^{i_1 i_2}}\]

  • tensor’s label
  • braket
  • particle columns
  • tensor symmetry
  • index
  • proto-indices (\(a_1^\alert{i_1 i_2}\))
  • Standard tensor network canonicalization algorithms, e.g. Butler-Portugal, could be used.
  • But they are known to be inefficient (\(\mathcal{O}(N !)\)), particularly when:
    • many identical tensors appear in the tensor network
    • many tensors with symmetry (e.g. particle symmetry) appear in the network

Color Graph Canonicalization

  • Demonstrating equivalence between \[\textbf{TN}_1 (= \tilde{a}_{i_2 i_1}^{a_2 a_1} t_{a_2}^{i_2} t_{a_1}^{i_1})\] and \[\textbf{TN}_2 (= t_{a_2}^{i_1} \tilde{a}_{i_2 i_1}^{a_2 a_1} t_{a_1}^{i_2}).\]
  • Visualized vertex ordering and regrouping for corresponding tensors.
  • Sign change originates from the net permutation of slots in \(\tilde{a}\) tensor (\(\textbf{TN}_2\)).
  • Normalization: renamed dummy indices & lexicographically sorted the tensors

Performance of Colored Graph Canonicalizer

Canonicalizing \(D_{i_1 i_2} D_{i_3 i_4} \dots D_{i_{N-1} i_N} U^{π[i_1 \dots i_N]}\)

Our graph-theoretic solution surpasses Butler-Portugal-based methods in speed

Old Problems Renewed in CSV-CC (again..)

How to implement?

  • Non CSV method development itself is pretty challenging.
    • Tedious and error-prone
  • CSV method devlopment is adds to the complexity
    • non-covariant tensor algebra
    • block-sparse “ragged” data structure

Non-affine Algebra

\[f_{a_1}^{a_3} t_{a_3 a_2}^{i_1 i_2}\]

\[f_{a_1^{i_1 i_2}}^{a_3^{i_1 i_2}} t_{a_3^{i_1 i_2} a_2^{i_1 i_2}}^{i_1 i_2}\]

\[R(a,b,i,j) \leftarrow \sum_c f(a,c) \times t(c,b,i,j)\]

# Bounds A, B, C are constants
for i in range(0, I):
  for j in range(0, J):
    for a in range(0, A):      # <--- Independent bound
      for b in range(0, B):    # <--- Independent bound
        for c in range(0, C):  # <--- Independent bound
           R[a,b,i,j] += F[a,c] * T[c,b,i,j]
  • tensors can be assumed to be hyper-rectangle (tensor-of-sub-tensors)
    • we will call them tensor-of-scalars (ToS)
for i in range(0, I):
  for j in range(0, J):
    # Bounds are functions of (i,j)
    curr_A = get_dim_a(i, j)
    curr_C = get_dim_c(i, j)
    
    for a in range(0, curr_A):     # <--- Dependent bound
      for b in range(0, curr_B):   # <--- Dependent bound
        for c in range(0, curr_C): # <--- Dependent bound
           # Indexing requires pointer chasing
           R[...][a][b] += F[a][c] * T[...][c][b]
  • tensors are ‘ragged’ (tensor-of-tensors)
  • we will call them tensor-of-tensors (ToT)

Implementation

  • A possibility: sparse tensor storage formats eg. compressed sparse row (CSR).
  • This style will silence erroneous implementations (misses the semantics). \[\sum_{c^\alert{i j}} f(a^\alert{i j},c^\alert{i j}) \times t(c^\alert{i j},b^\alert{i j},i,j) \not\equiv \underset{\text{CSR view}}{\sum_c f(a,c) \times t(c,b,i,j)}\]
  • We needed a tool that enables rapid proto-typing.

We asked: could we make use of an existing tensor framework?

Understanding a general product between Tensor-of-Scalars (ToS)

\[\sum_{k} A\left(i, j, k, l \right) \times B \left(j,k,l,m\right) \to C\left(m,i,j,l\right)\]

\[i j k l\alert{,}j k l m \to m i j l\]

Semantics

  • batched indices: \(j\) and \(l\).
  • free indices: \(i\) and \(m\).
  • contracted index: \(k\).

Foundation

The properties of the scalar ring.

  • scalar \(\alert{\times}\) scalar \(\to\) scalar
  • scalar \(\alert{+}\) scalar \(\to\) scalar

Implementation

We asked: could we make use of an existing tensor framework?

\[\sum_{c^\alert{i j}} f(a^\alert{i j},c^\alert{i j}) \times t(c^\alert{i j},b^\alert{i j},i,j)\]

ToT

\[i j\alertTeal{;} a^{i j} c^{i j}\alert{,} i j\alertTeal{;} b^{i j} c^{i j} \to i j\alertTeal{;} a^{i j} b^{i j}\]

If we view the inner tensors of a ToT as ‘generalized scalars’ \[i j\alert{,} i j \to i j\]

We have not defined the ring-like equivalence for these generalized scalars yet.

Implementation

Viewing the CSV-tensors with dependent indices as ToTs gives a powerful starting point to implement CSV-methods.

Built-in semantics

\[i j\alertTeal{;} a^{i j} c^{i j}\alert{,} i j\alertTeal{;} b^{i j} c^{i j} \to i j\alertTeal{;} a^{i j} b^{i j}\]

\(\{i, j\}\) determines which inner tensors can contract. Less error prone.

Familiar dense/block-sparse tensor algebra

\[i j\alert{,} i j \to i j\]

  • No need to worry about the raggedness.
  • Tensor framework’s features availed from the outer tensor’s perspective
    • block-sparsity
    • data distribution

Implementation

Ring-like operations support for ToT

Implemented in TiledArray.

Automated Development

One of the major goals was to automate the many-body method implementation.

Common steps of method development

  • Symbolic derivation
  • Optimization for evaluation
  • Code implementation
  • Every step is error-prone and tedious for manual implementation
  • Difficult to make it ‘performance portable’
    • can we easily port to different hardwares?
    • will it be usable in a decade with little friction?

… from then (\(> 30\) yr) to now

Long realized automation is crucial

This work

  • Motivated by the lack of support for CSV-methods.
  • Supports standard many-body method development too.

Automated Development

We have designed SeQuant after a modern three-stage compiler.

Three stages: Front end

  • general compiler fronted parses the code into an abstract syntax tree (AST)
  • AST encodes different operations/symantics between nodes
  • In SeQuant: the Expr objects from method derivation serve as the AST

 

\(g_{ij}^{a^{ij}b^{ij}}\left( 2t_{a^{ij}b^{ij}}^{ij} - t_{a^{ij}b^{ij}}^{ji} \right)\)

Automated Development

We have designed SeQuant after a modern three-stage compiler.

Three stages: Middle end

Intermediate Representation (IR)

  • Preparatory stage for the backend/evaluation.
  • Some semantics of tensor algebra might be lost as part of optimization.
    • Notice the physical layout of the two \(t\) tensors.

 

\(g_{ij}^{a^{ij}b^{ij}}\left( 2t_{a^{ij}b^{ij}}^{ij} - t_{a^{ij}b^{ij}}^{ji} \right)\)

Automated Development

We have designed SeQuant after a modern three-stage compiler.

Three stages: Middle end (Optimizations)

\(\text{O}, \text{V}\): dimension of indices \(i_k\) and \(a_k\)s. With \(\text{V} \gg \text{O}\)

\[(g_{\alert{i_3} i_4}^{a_3 a_4} t_{a_1 a_2}^{i_1 \alert{i_3}}) t_{a_3 a_4}^{i_2 i_4}\]

\[\text{O}^3\text{V}^4\]

\[(g_{i_3 i_4}^\alert{a_3 a_4} t_\alert{a_3 a_4}^{i_2 i_4}) t_{a_1 a_2}^{i_1 i_3}\]

\[\text{O}^3\text{V}^2\]

  • Robustly implemented for non-CSV methods using dynamic programming.
  • We have not explored much for CSV methods
    • Less severe effects from contraction ordering (at least for FLOPs).
    • A research problem of its own.

\[\frac{1}{16}\alert{(\bar{g}_{k l}^{c d} t_{c d}^{i j})} t_{a b}^{k l} + \frac{1}{8}\alert{(\bar{g}_{k l}^{c d} t_{c d}^{i j})} t_{a}^{k} t_{b}^{l}\]

  • Identify the repeating sub-expressions (highlighted)
  • Evaluate and store \(I_{k l}^{i j} \leftarrow \bar{g}_{k l}^{c d} t_{c d}^{i j}\)
    • this is done as part of the evaluation of the ‘first’ term
  • Reuse from cache \(I_{k l}^{i j}\)
    • this is done as part of the evaluation of the ‘second’ term

Automated Development

Three stages: Middle end (Canonicalizer for semantic enforcing)

The graph canonicalizer again is crucial at this stage.

  • Provides the canonical identity for CSE
  • Provides the canonical index labels
    • for unambiguous storage and retrieval from cache

\[\mathbf{T}_1 = \overline{g}_{i_2 a_1^\alertPink{i_1}}^{\alertTeal{a_2}^\alertOrange{i_3} a_3^{i_4}} \text{ and } \mathbf{T}_2 = \overline{g}_{i_2 a_1^{i_4}}^{a_3^\alertPink{i_1} \alertTeal{a_2}^\alertOrange{i_3}}\] \[\mathbf{T}_1 \equiv \mathbf{T}_2\] \[\left(i_4,i_3,i_1,i_2;a_1^{i_1},a_3^{i_4},a_2^{i_3}\right) \text{ and } \left(i_1,i_3,i_4,i_2;a_1^{i_1},a_3^{i_4},a_2^{i_3}\right) \text{ with a phase of } (-1)\]

Automated Development

Three stages: Back end (Interpreter)

The interpreter builds on top of the IR.

Handles

  • diverse data types resulting from the IR nodes
    • scalars, ToS, ToT
  • working with such heterogeneous result types is abstracted over a Result abstract class
  • leveraging this abstraction, the interpreter allows plug-and-play of different numerical backends
    • TiledArray
    • BTAS

Automated Development

Three stages: Back end (Interpreter)

The interpreter builds on top of the IR.

Results

SeQuant—a C++ library/framework for

  • Symbolic Derivation: wide range of theories: including PNO-CC

  • Automatic Evaluation: let code compilation not break our workflow

  • Performant and Scalable:

    • implements various optimizations for improved performance
    • supports auto evaluation on HPC nodes out of the box (using TiledArray tensor framework)
  • Extensible Architecture: more numerical tensor frameworks and quantum chemistry packages? welcome

    • e.g. Köhn group, University of Stuttgart

Results

Automation of standard many-body methods

SeQuant has accelerated the development by avoiding algebraic complexity of derivation, optimization, and evaluation in the following works.

Results

How about CSV-CC

  • We can automatically implement CSV-CC methods of arbitrary order
  • We can already reproduce the results of manually implemented PNO-CC

However

We are facing some challenges regarding memory bottlenecks.

Broader Impact

How does this change the field?

  1. Empowering Innovation: Theorists can now ask “What if?” without fearing years of coding.
    • Prototyping happens in days, not PhDs.
  2. Lowering the Barrier: Scientists can focus on theory, not software architecture.
  3. Reproducibility:
    • Automation eliminates human error.
    • The separation of symbolic logic from numerical execution guarantees verifiability.

Future Research Questions:

  • Properties & Response Theory: Can we automate the derivation of polarizabilities and magnetic properties?.
  • New Theoretical Classes: Extending the graph-based engine to Relativistic QM and Advanced Multireference theories.
  • Emerging Hardware: Targeting GPUs and accelerators by simply swapping the backend numerical library.

Moving closer to the ultimate goal:

A fully automated workflow where we design, test, and deploy the next generation of predictive theories with unprecedented speed.

Acknowledgments

  • Prof. Edward Valeev
  • Department of Chemistry
    • Joli, Esker, John
  • Collaborators
    • Robert, Saurabh, Saday
  • My research committee
    • T. Daniel Crawford
    • Diego Troya
    • Nicholas Mayhall
  • Current and past group members
  • People from all walks of my life
Clement, M. C., Zhang, J., Lewis, C. A., Yang, C., & Valeev, E. F. (2018). Optimized pair natural orbitals for the coupled cluster methods. J. Chem. Theory Comput. American Chemical Society.
Engels-Putzka, A., & Hanrath, M. (2011). A fully simultaneously optimizing genetic approach to the highly excited coupled-cluster factorization problem. J. Chem. Phys. American Institute of Physics.
Evangelista, F. A. (2022). Automatic derivation of many-body theories based on general Fermi vacua. J. Chem. Phys.
Gordon, M. S., Fedorov, D. G., Pruitt, S. R., & Slipchenko, L. V. (2012). Fragmentation methods: A route to accurate calculations on large systems. Chem. Rev.
Hirata, S. (2003). Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. J. Phys. Chem. A.
Janssen, C. L., & Schaefer, H. F., III. (1991). The automated solution of second quantization equations with applications to the coupled cluster approach. Theor. Chim. Acta.
Jiang, A., Schaefer, H. F., & Turney, J. M. (2025). Linear-scaling quadruple excitations in local pair natural orbital coupled-cluster theory. J. Chem. Phys. AIP Publishing.
Kállay, M., & Surján, P. R. (2001). Higher excitations in coupled-cluster theory. J. Chem. Phys. American Institute of Physics.
Lai, P.-W., Zhang, H., Rajbhandari, S., Valeev, E., Kowalski, K., & Sadayappan, P. (2012). Effective utilization of tensor symmetry in operation optimization of tensor contraction expressions. Chem. Phys. Lett.
Lechner, M. H., Papadopoulos, A., Sivalingam, K., Auer, A. A., Koslowski, A., Becker, U., et al. (2024). Code generation in ORCA: Progress, efficiency and tight integration. Phys. Chem. Chem. Phys.
Liebenthal, M. D., Yuwono, S. H., Koulias, L. N., Li, R. R., Rubin, N. C., & DePrince, A. E. (2025). Automated Quantum Chemistry Code Generation with the p q Package. J. Phys. Chem. A.
Masteran, C., Gaudel, B., & Valeev, E. F. (2025). Toward a Balanced Description of Ground and Excited States with Transcorrelated F12 Methods. J. Chem. Theory Comput.
Masteran, C., Kumar, A., Teke, N., Gaudel, B., Yanai, T., & Valeev, E. F. (2023). Comment on Canonical transcorrelated theory with projected Slater-type geminals” [J. Chem. Phys. 136, 084107 (2012)]. J. Chem. Phys.
Neese, F., Hansen, A., & Liakos, D. G. (2009). Efficient and accurate approximations to the local coupled cluster singles doubles method using a truncated pair natural orbital basis. J. Chem. Phys.
Pavošević, F., Peng, C., Pinski, P., Riplinger, C., Neese, F., & Valeev, E. F. (2017). SparseMapsA systematic infrastructure for reduced scaling electronic structure methods. V. Linear scaling explicitly correlated coupled-cluster method with pair natural orbitals. J. Chem. Phys. AIP Publishing LLC.
Riplinger, C., Pinski, P., Becker, U., Valeev, E. F., & Neese, F. (2016). Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory. J. Chem. Phys.
Teke, N. K., Melekamburath, A., Gaudel, B., & Valeev, E. F. (2024). Best Iterative Coupled-Cluster Triples Model? More Evidence for 3CC. J. Phys. Chem. A.