Bimal Gaudel
C++ Systems Engineer · Numerical Libraries · Simulation & Scientific Computing
Open to work. Looking for C++ systems and runtime roles, numerical and simulation libraries, scientific software R&D, and HPC research-software / postdoc roles. Based in Blacksburg, VA. Open to relocation across the US, Canada, and Europe (priority in that order). Visa sponsorship required.
bimalgaudel@gmail.com Download CV
C++17/20/23
I build production C++ systems and numerical libraries for scientific computing — the runtimes, solvers, and library cores other engineers and researchers ship on top of. Not a theorist who codes; an engineer who understands the science.
Technical Expertise
- Modern C++ (17/20/23). Production-grade runtimes, template metaprogramming, and extensible APIs. Designed SeQuant’s backend interface so cleanly that an external contributor integrated a new numerical framework in under 30 minutes.
- Numerical & simulation libraries. Block-sparse distributed tensor framework (TiledArray) extended with tensor-of-tensor data structures for PNO-like local-correlation methods; DP-based contraction ordering using runtime index extents; graph-theoretic optimization (TNCO, CSE).
- HPC & distributed runtimes. MPI distributed-memory programming, memoization-cache design, and supercomputer-scale execution validated in MPQC production on NERSC Perlmutter.
SeQuant — Lead Designer of a Complete Compiler Pipeline
I designed and implemented SeQuant from scratch over seven years — embedded DSL, graph-canonicalized IR, DP optimizer, tree-walk interpreter, and MPI runtime. The system automates the path from symbolic quantum-chemistry derivations to supercomputer-scale numerical execution, compressing months of researcher development work into days.
- Embedded DSL + canonicalized IR. SeQuant expressions serve directly as the AST in idiomatic C++; the IR is a full binary tree with canonical node identity from tensor-network graph canonicalization, enabling robust common-subexpression elimination across separate equations.
- Runtime DP optimizer + tree-walk interpreter. Bottom-up dynamic-programming contraction ordering using actual index-space extents; post-order traversal with canonical-identity-keyed memoization and pending-use cache eviction. Validated in MPQC production on NERSC Perlmutter.
- Extensible backend. Interface clean enough that an external contributor integrated a new numerical framework in under 30 minutes; BTAS and TiledArray backends ship with SeQuant. IR also lowers to C++/Python targeting downstream tensor compilers (TACO).
Full architecture described in (Gaudel et al., 2026).
Defended December 2025. Slides.
TiledArray
Contributing developer on TiledArray, the massively-parallel block-sparse distributed tensor framework that backs SeQuant’s runtime. Extended its operation basis to support tensor-of-tensor data structures (outer-only, inner-only, and mixed contractions; TT×T and TT×TT products; full TT dot products), enabling block-wise compression and PNO-like local-correlation methods.
Experience
Research Assistant | Virginia Tech 2018 - 2025
Designed and implemented SeQuant’s compiler and runtime; contributed to TiledArray.
Education
PhD in Theoretical & Computational Chemistry | Virginia Tech 2018 - 2025
Dissertation: Automated implementation of advanced electronic structure methods
Master of Science in Physical Chemistry | Tribhuvan University 2014 - 2016
Publications
SeQuant (Gaudel et al., 2026)
A color-graph approach to canonicalizing tensor networks, used for symbolic transformation and runtime evaluation of many-body methods.
Applied research using SeQuant
- Theoretical exploration of new ansatze in explicitly correlated methods (Masteran et al., 2025).
- Geminal parameter tuning in explicitly correlated methods (Powell et al., 2025).
- Discovery of effective theories compared to complex counterparts (Teke et al., 2024).
- Identification and correction of errors in previously published works (Masteran et al., 2023).
Featured writing
Fast RTTI in C++ for a class hierarchy
Compile-time type-id generation for performant runtime type inference across a class hierarchy.
The access-by idiom in C++
A pattern for unit-testing private methods when no better seam exists.