Research Scientist @ MIT CSAIL
Emmanuel Lujan, Ph.D.
I am a Research Scientist at MIT CSAIL and the Julia Lab, working where AI meets scientific simulation and high-performance computing. I am interested in how generative and agentic AI can help navigate high-dimensional, underexplored algorithmic design spaces.
MIT CSAIL · 32 Vassar St, Cambridge, MA 02139
Current Projects
I co-lead a DARPA–MIT initiative on AI-guided algorithmic discovery for fast linear algebra. This research aims to generate novel, high-performance variants of existing algorithms that exploit matrix structure, including mixed precision and tiling, and to develop new strategies for algorithm and architecture selection.
smartsolveai.mit.edu →As part of MIT's CHEFSI project, I work on addressing a major bottleneck in finite-element simulation: the solution of large linear systems. Our goal is to develop new high-performance solutions that integrate optimal solver selection, mixed-precision methods, and scalable distributed execution for exascale simulation.
chefsi.mit.edu →I collaborate with Robert Metcalfe, Turing Award laureate, on integrating AI with physics-based simulation to reduce the levelized cost of energy (LCOE) in deep-borehole heat exchanger arrays.
geothermalarrays.mit.edu →Research Output
🚧 This section is a work in progress.
Forthcoming
-
→Parallel Agentic AI for Discovering Specialization and Task-Distribution AlgorithmsManuscript in preparation
Peer-Reviewed Journal & Conference Papers
-
Feb 2026Arrays of Networked Standard Geothermal Wells Conference PaperPROCEEDINGS, 51st Workshop on Geothermal Reservoir Engineering, Stanford University, 2026Time to scale the harvesting of geothermal heat for conversion to grid electricity. And not just because geothermal is clean. No, we choose geothermal because it is firm, inexhaustible, safe, and competitively harvestable almost anywhere (cheap). How best to scale geothermal? We urge the development of geothermal arrays. With our Internet mindsets in place, let’s not continue toward ever bigger geothermal mainframes. Instead, let’s deploy geothermal arrays by networking smaller, standard, competitively-sourced wells. While accelerating progress continues at the well level, in this paper let’s move up to the “array” level. We engage in dialogue with a large language model (GPT) about how soon we can deliver energy with capacity factors (CFs) approaching 100% and with levelized costs of electricity (LCOE) at a new plateau, less than a cent per kilowatt-hour (<1¢/kWh).
-
Feb 2026A Full Three-Dimensional GPU-Accelerated Model for Deep Borehole Heat Exchangers (DBHEs) Enabling Simulation of Well Arrays Conference PaperPROCEEDINGS, 51st Workshop on Geothermal Reservoir Engineering, Stanford University, 2026Deep borehole heat exchangers (DBHEs) present significant computational challenges due to their multi-scale geometry and long operational timescales. We present a GPU-accelerated three-dimensional model that makes well array simulations computationally tractable through an operator splitting strategy tailored to the problem’s physics. The method separates vertical diffusion (stabilized explicit Runge–Kutta–Chebyshev), horizontal diffusion (alternating direction implicit), and advection (semi-Lagrangian), achieving near-unconditional stability with high efficiency. We validate against three published models using different numerical approaches, showing excellent to good agreement. The vendor-agnostic Julia implementation enables full three-dimensional simulation of multi-well arrays on a single GPU, opening new possibilities for systematic design optimization and long-term performance assessment of geothermal well systems. The implementation is released as the open-source Julia package GeothermalWells.jl.
-
Sep 2025IEEE High Performance Extreme Computing Conference (HPEC), 2025Algorithmic dispatch is essential for performance in linear-algebra–intensive systems. A persistent challenge lies in the treatment of structured matrices. Although such matrices are often described as “sparse,” the term structured is more precise, as it highlights exploitable properties—such as bandedness or triangularity—whose algorithmic advantages extend beyond sparsity alone. When the dispatch strategy leaves these structures unrecognized, valuable opportunities for optimization are lost. Recent advances in generative AI offer the promise of linking these silent structures to more effective algorithmic and architectural choices, supplying much of the missing connective tissue in computational linear algebra. This work introduces analytical criteria—grounded in time-complexity analysis—to determine when structure-aware dispatch delivers tangible gains. We examine the overheads of structure detection and data-format conversion, characterizing their impact on speedup and slowdown. We illustrate these concepts through a case study on LU factorization applied to banded matrices stored in a dense format, demonstrating results that align with theoretical bounds and reveal substantial gains in both performance and memory usage.
-
Sep 2025Data-Driven Dynamic Algorithm Dispatch with Large Language Models Conference Paper Outstanding Short PaperIEEE High Performance Extreme Computing Conference (HPEC), 2025We introduce a large language model (LLM)-driven approach for generating dynamic algorithmic dispatch heuristics in high-performance linear algebra. By combining prompt engineering with LLaMA 3 and a curated performance database, the model learns to synthesize selection heuristics that exploit structural patterns to identify fast algorithmic choices. A case study on LU factorization demonstrates the model’s ability to replicate expert-designed strategies. This work, developed as part of the DARPA–MIT SmartSolve project, highlights the promise of LLMs for algorithmic discovery and the development of more adaptive, fast linear algebra software.
-
Feb 2025Decision-Support and Modeling with Large Language Models for Geothermal Well Arrays Conference PaperPROCEEDINGS, 50th Workshop on Geothermal Reservoir Engineering, Stanford University, 2025Geothermal well arrays, which organize multiple geothermal wells into carefully planned geometric configurations, provide opportunities to enhance energy production capacity and increase fault tolerance. The development and adoption of these emerging geothermal technologies could be accelerated through the recent advances in large language models (LLMs) and high-level high-performance languages. A challenge in LLM-based applications is the reliability of the generated outputs, as they can be prone to subjective biases and “hallucinations”. This study assesses the potential of cutting-edge LLMs—such as ChatGPT, Gemini, Claude, Grok, and domain-specific models like AskGDR—as expert assistants that can synthesize insightful interpretations of complex geothermal data, as well as improve feature capabilities of geothermal models and numerical software. We developed a novel approach, leveraging Google’s recently introduced AI assistant, NotebookLM, to accelerate the generation of unpublished quantitative geothermal benchmarks. In particular, we use these benchmarks and LLM-based interviews to analyze opportunities and limitations of two promising technologies: geothermal well arrays and closed-loop coaxial wells. This line of research could play a transformative role in the geothermal sector by enabling the next-generation of decision-support applications.
-
Dec 20212021 Winter Simulation Conference (WSC), IEEE, 2021Smart cities are witnessing exceptional growth in their connections, increasing the need for LPWA communications, i.e. low-bitrate, coverage enhancement, ultra-low power consumption, and massive terminal access. 5G Narrowband-IoT has emerged to satisfy these requirements. Notwithstanding, it presents limitations in extreme coverage scenarios, where devices can lose connectivity unnecessarily. The addition of Device-to-Device (D2D) communications, connecting out-of-coverage devices with a base station through a relay, is a solid approach for mitigating these issues. This study targets two typical scenarios, urban and suburban, measuring the impact of the duty cycle, path-loss, retransmissions and interference, regarding the expected delivery ratio, end-to-end delay, and the QoS. Our simulations show how the behavior of these quantities leads to a novel strategy to avoid disconnection.
-
Jan 2021Scientific Reports, Nature, 2021Electroporation (EP), the increase of cell membrane permeability due to the application of electric pulses, is a universal phenomenon with a broad range of applications. In medicine, some of the foremost EP-based tumor treatments are electrochemotherapy (ECT), irreversible electroporation, and gene electrotransfer (GET). We present OpenEP, an open-source specific purpose simulator for EP-based tumor treatments, modeling among other variables, threshold, and electroporated tissue variations in time. Distributed under a free/libre user license, OpenEP allows the customization of tissue type; electrode geometry and material; pulse type, intensity, length, and frequency. OpenEP facilitates the prediction of an optimal EP-based protocol, such as ECT or GET, defined as the critical pulse dosage yielding maximum electroporated tissue with minimal damage. OpenEP displays a highly efficient shared memory implementation by taking advantage of parallel resources; this permits a rapid prediction of optimal EP-based treatment efficiency by pulse number tuning.
-
Dec 2019Extreme Coverage in 5G Narrowband IoT: A LUT-Based Strategy to Optimize Shared Channels Journal PaperIEEE Internet of Things Journal, 2019One of the main challenges in Internet of Things (IoT) is providing communication support to an increasing number of connected devices. In recent years, the narrowband radio technology has emerged to address this situation: narrowband IoT (NB-IoT), which is now part of 5G. Supporting massive connectivity becomes particularly demanding in extreme coverage scenarios, such as underground or deep inside building sites. We propose a novel strategy for these situations focused on optimizing NB-IoT shared channels through the selection of link parameters: modulation and coding scheme, as well as the number of repetitions. Specifically, our strategy is based on a lookup table (LUT) scheme which is used for rapidly delivering the optimal link parameters given a target QoS. Results show that, especially under extreme conditions, only a few options for link parameters are available, favoring robustness against measurement uncertainties. Our strategy minimizes resource usage in all scenarios of the acknowledged mode and remarkably reduces losses in the unacknowledged mode.
-
Sep 2019Lite NB-IoT Simulator for Uplink Layer Conference PaperXVIII Workshop on Information Processing and Control (RPIC), Argentina, 2019The Internet of Things (IoT) is a new paradigm that gives rise to Low Power Wide Area Networks (LP-WAN). Narrow Band Internet of Things (NB-IoT) is a particular network architecture built on the legacy of LTE that is most suitable for IoT applications. In this paper, we present our work towards the implementation of a protocol simulator for NB-IoT communications. A summary of the NB-IoT physical layer is presented, deepening on the description of the uplink channel and ending with a description of the uplink scheduling process. As an application of our toolbox, we present NB-IoT uplink block error rate (BLER) curves traced via link layer simulations and derive the optimal link adaptation for an Additive White Gaussian Noise (AWGN) channel.
-
May 2019LibreGrowth: A Tumor Growth Code Based on Reaction-Diffusion Equations Using Shared Memory Journal PaperComputer Physics Communications, 2019In recent years, in-silico experimentation within the field of oncological medicine has been intensively investigated with the aim of better understanding tumor dynamics and dose–response relationships in cancer treatments. Here we present LibreGrowth, a libre tumor growth code able to simulate the core growth and peripheral tumor cell infiltration, considering a benign and a malignant stages. We implemented a reaction–diffusion based model, with spatially variable diffusion coefficient, into a three-dimensional domain, using C++ and OpenMP over a GNU/Linux system. LibreGrowth aims to provide a flexible implementation for depicting heterogeneous tissues and infiltration processes, and to shed light in current therapy optimization strategies.
-
2019Electrochimica Acta, 2019In search of an optimal gene electrotransfer (GET) protocol, an electroporation-based (EP) tumor treatment with great potential as a non-viral gene-delivery system, the concept of the dose-response relationship is introduced. It is shown that a reliable dose parameter is the pulse dosage and reliable response parameters are the reversibly electroporated tissue area as well as the unwanted damaged tissue area and plasmid damage due to pH. An optimal dose-response relationship in a GET protocol is predicted as the critical pulse dosage yielding maximum reversibly electroporated tissue area with minimal tissue area damage induced by pH fronts. Moreover, since damage induced by pH changes is proportional to the Coulomb dosage, damage induced by pH fronts is negligible in typical EP-based tumor protocols such as in electrochemotherapy (ECT) and irreversible electroporation (IRE) but not in GET, due to the most often longer pulses applied.
-
2019Revista Facultad de Ingeniería, Universidad de Antioquia, 2019Energy management focuses on improving the efficient use of resources and increasing energy access in a path towards a more sustainable society. Cloud Computing for Smart Energy Management project (CC-SEM) is a research effort for building an integrated platform for smart monitoring, controlling, and planning energy consumption and generation in urban scenarios. CC-SEM includes the design of a low-cost IoT device capable of monitoring, operating, and controlling home appliances; an analysis of 5G Narrowband IoT as a suitable cellular technology for Smart Grid outage restoration; an analysis of domestic consumption patterns to help predict home consumption; and a forecasting and performance evaluation methodology for the generation of individual photovoltaic systems. CC-SEM presents a set of tools for controlling home devices, planning/simulating scenarios of energy generation, and advances in the communication infrastructure for transmitting the generated data.
-
Sep 2018Cloud Computing for Smart Energy Management (CC-SEM Project) Conference PaperCongreso Iberoamericano de Ciudades Inteligentes (ICSC-CITIES), 2018This paper describes the Cloud Computing for Smart Energy Management (CC-SEM) project, a research effort focused on building an integrated platform for smart monitoring, controlling, and planning energy consumption and generation in urban scenarios. The project integrates cutting-edge technologies (Big Data analysis, computational intelligence, Internet of Things, High Performance Computing and Cloud Computing), specific hardware for energy monitoring/controlling built within the project, and explores their communication. The proposed platform considers the point of view of both citizens and administrators, providing a set of tools for controlling home devices (for end users), planning/simulating scenarios of energy generation (for energy companies and administrators), and shows some advances in communication infrastructure for transmitting the generated data.
-
May 2018Microenvironmental Influence on Microtumour Infiltration Patterns: 3D Mathematical Modelling Supported by In Vitro Studies Journal PaperIntegrative Biology, Oxford University Press, 2018Tumour infiltration extent and its spatial organization depend both on the tumour type and stage and on the bio-physicochemical characteristics of the microenvironment. This work presents an experimental/numerical combined method for the development of a three-dimensional mathematical model with the ability to reproduce the growth and infiltration patterns of a given avascular microtumour in response to different microenvironmental conditions. The model is based on a diffusion–convection reaction equation that considers logistic proliferation, volumetric growth, a rim of proliferative cells at the tumour surface, and invasion with diffusive and convective components. The in vitro model consists of multicellular tumour spheroids (MTSs) of an epithelial mammary tumour cell line (LM3) immersed in a collagen I gel matrix. It was experimentally determined that adipocyte conditioned media had the ability to change the MTS infiltration pattern from collective and laminar to an individual and atomized one. Numerical simulations adequately reproduced both kinds of infiltration patterns, which were determined by area quantification, analysis of fractal dimensions and lacunarity, and Bland–Altman analysis.
-
Nov 2017Modelado Matemático de un Patrón de Invasión Tumoral a través de Ecuaciones de Reacción-Difusión y Fractales DLA Conference PaperENIEF — Mecánica Computacional, Bioengineering and Biomechanics, vol. 35, 2017Se presenta un modelo matemático basado en una ecuación de reacción-difusión-convección que logra describir un patrón de infiltración microtumoral resultante de la incorporación de medio condicionado proveniente de adipocitos al microambiente tumoral de esferoides multicelulares. La descripción del tipo de invasión se logra incorporando una difusión espacialmente variable dependiente de una matriz fractal generada por el método DLA (Diffusion Limited Aggregation). Los valores de los principales parámetros del modelo se estiman a partir de datos experimentales. Las simulaciones obtenidas se ajustan cualitativa y semicuantitativamente a los resultados in vitro, según muestran los análisis de fractalidad realizados por los métodos de boxcounting y lagunaridad.
-
Jul 2016Integrative Biology, 2016The objective of this study was to build a mathematical model able to describe the growth and the real invasion pattern of multicellular tumour spheroids immersed in a collagen matrix. The model may be used in a descriptive (case-specific) as well as in a predictive (population-dependent) way, depending on the type of the input parameters (a shape function obtained from a given experimental case or an aleatory shape function generated by data mining and Monte Carlo tools from the entire dataset, respectively). This kind of empirical-numerical interaction has wide application potential at the basic research and at the clinical level.
-
2016Electrolytic Ablation Dose Planning Methodology Conference Paper1st World Congress on Electroporation and Pulsed Electric Fields in Biology, Medicine and Food & Environmental Technologies, Springer, 2016Electrolytic ablation (EA), a medical treatment increasingly used in solid tumor ablation, consists in the passage of a low direct electric current through two or more electrodes inserted in the tissue thus inducing pH fronts that destroys the tumor. The combined use of EA with a recently introduced one-probe two electrode device (OPTED) results in a minimally invasive tissue ablation technique. Despite its success related to low cost and minimum side effects, EA has drawbacks such as the difficulty in determining the current and time needed to assure total tumor ablation while avoiding healthy tissue intrusion. Here we introduce a realistic dose planning methodology in terms of the coulomb dosage administered and the associated pH tracking, that predicts an optimal EA/OPTED protocol treatment for a given tumor size.
-
Oct 2015Optimal Dose-Response Relationship in Electrolytic Ablation of Tumors with a One-Probe-Two-Electrode Device Journal PaperElectrochimica Acta, vol. 186, pp. 494–503, Pergamon, 2015Electrolytic ablation (EA) of tumors consists in the passage of a low constant electric current through two or more electrodes inserted in the tissue thus inducing pH fronts that produce tumor necrosis. Combined with a recently introduced one-probe two electrode device (OPTED) this procedure results in a minimally invasive treatment. In this work, a theoretical model is introduced describing the EA/OPTED as an electrolytic process and the underlying electrochemical reactions through the Nernst-Planck equations for ion transport. Model results show that the coulomb dosage is a reliable dose parameter and predicts an optimal dose-response relationship for a given tumor size subjected to an EA/OPTED, considering the optimum as the minimum coulomb dosage necessary to achieve total tumor destruction while minimizing healthy tissue damage. Moreover, it predicts a nonlinear relationship between coulomb dosage and necrotized tumor volume, dosage and NTV scaling as Q¹⋅⁴.
Preprints
-
Jul 2021arXiv:2107.09443, 2021Physics-informed neural networks (PINNs) are an increasingly powerful way to solve partial differential equations, generate digital twins, and create neural surrogates of physical models. In this manuscript we detail the various methodologies of PINNs and showcase the various types of problems a PINN software can solve. We then detail the inner workings of NeuralPDE.jl and show how a formulation structured around numerical quadrature gives rise to new loss functions which allow for adaptivity towards bounded error tolerances. We showcase how NeuralPDE uses a purely symbolic formulation so that all of the underlying training code is generated from an abstract formulation, and show how to make use of GPUs and solve systems of PDEs. We end by focusing on a complex multiphysics example, the Doyle-Fuller-Newman (DFN) Model, and showcase how this PDE can be formulated and solved with NeuralPDE.
Scientific Software
-
Jul 2025Zenodo, 2025SmartSolve.jl is a Julia-based toolbox for AI-guided algorithmic discovery, designed to accelerate computations by generating enhanced algorithmic and architectural selection strategies. Envisioned as a general-purpose tool for scientific computing, current efforts focus on challenges in computational linear algebra. The toolbox addresses the growing complexity of selecting efficient solvers, data formats, precision strategies, and hardware resources for structurally diverse matrices—where conventional approaches offer substantial opportunities for improvement. SmartSolve.jl constructs a performance database through systematic benchmarking and applies automated Pareto analysis to identify optimal trade-offs between accuracy and speed. This database underpins a data-driven model that synthesizes dispatch strategies tailored to high-performance linear algebra software.
-
2021GitHub · MIT-CESMIX · 2021–present
-
Jan 2021Scientific Reports, Nature, 2021
Research Talks & Posters
-
Apr 2026MIT CHEFSI TST Meeting, 2026
-
Apr 2026
-
Aug 2025Fast Density Functional Theory for Training Machine Learning Interatomic Potentials via Large-Scale Atomistic Sampling Poster TalkIAIFI Summer Workshop, Harvard University · Cambridge, MA, 2025PDF Workshop
-
Jul 2025JuliaCon 2025, Lightning TalkThis talk presents the DARPA–MIT SmartSolve project, which automatically selects optimal algorithms and data structures for linear algebra operations. We benchmark diverse algorithms across matrix patterns—dense, sparse, and banded—measuring computation time, casting time, and accuracy. Pareto analysis identifies efficient algorithm–data structure combinations. We then explore how large language models (LLaMA 3) can assist in algorithm classification and recommendation. Selecting the optimal combination can achieve over a 50× speedup compared to default selections in major linear algebra libraries.
-
Mar 2024Women in Data Science (WiDS) Cambridge, 2024Machine learning surrogate models address computational bottlenecks in atomistic simulations. We introduce dimension reduction techniques applied to linear regression and neural network interatomic potentials, using Julia packages including PotentialLearning.jl and InteratomicPotentials.jl. Our approach focuses on accelerating energy and force calculations for larger systems through Atomic Cluster Expansion descriptors for Hf and HfO₂ materials, enabling rapid approximations with linear scalability with the number of atoms while maintaining accuracy across diverse atomistic systems.
-
Jul 2023JuliaCon 2023, MIT · Cambridge, MASimplifying the composition of machine learning (ML) interatomic potentials is key to finding combinations, between data, descriptors, and learning methods, that exceed the accuracy and performance of the state-of-the-art. The Julia programming language, and its burgeoning atomistic ecosystem, can facilitate the composition of neural networks and other ML models with cutting-edge interatomic potentials, through mechanisms such as multiple dispatch, differentiable programming, ML and GPU abstractions, as well as specialized scientific computing libraries. Here, the use of Julia to automatize the composition of a novel neural potential based on the Atomic Cluster Expansion (ACE) is presented as part of the research activities of the Center for the Exascale Simulation of Materials in Extreme Environments (CESMIX). The proposed scheme aims to facilitate the execution of parallel fitting experiments that search for hyper-parameter values that significantly improve the accuracy in training and test metrics of energies and forces with respect to different Density Functional Theory (DFT) data sets.
-
Jul 2022JuliaCon 2022 · Minisymposium: JuliaMolSim – Computation with Atoms
-
Jul 2021JuliaCon 2021 (virtual)MDP.jl, the Julia library of Molecular Dynamics (MD) potentials, is being developed to provide fast and accurate potentials for classical MD simulations on exascale supercomputers. Its goals include coupling empirical and machine learning (ML) potentials and quantifying uncertainties in trained ML potentials. This talk presents the latest developments in MDP.jl regarding descriptors and force field computation.
-
Sep 2019The Role of Damage in Reversible Electroporation Optimization: Theory and Experiments in a Vegetable Model Talk3rd World Congress on Electroporation and Pulsed Electric Fields in Biology, Medicine, and Food and Environmental Technologies · Toulouse, France
-
Feb 2019Workshop Internacional: Planificación de Transporte y Ciudades Inteligentes TalkUniversity of the Republic · Montevideo, Uruguay · Invited
-
Nov 2018Mathematical Model of Glioma Evolution and Treatment by Chemo and Radiotherapy Talk9th Argentinian Congress of Bioinformatics and Computational Biology (9CAB2C) · Mar del Plata, Argentina
-
Sep 2017Towards an Optimal Dose-Response Relationship in Electroporation-Based Tumor Treatments Talk2nd World Congress on Electroporation and Pulsed Electric Fields in Biology, Medicine and Food and Environmental Technologies · Norfolk, VA
-
Sep 2017The Concept of Electroporation Energy in Electroporation-Based Models Talk2nd World Congress on Electroporation and Pulsed Electric Fields · Norfolk, VA
-
Jul 2017An MPI-Based Implementation of a Simplified Actuator Line Model PosterXIX Giambiagi Winter School, UBA · Buenos Aires, Argentina
-
Oct 2015In Silico Generation of Tumor Invasion Patterns Poster6th Argentinian Conference on Bioinformatics and Computational Biology (A2B2C), 2015
-
Sep 2014Tissue Damage in Vaccination Protocols Based on Electroporation: pH Fronts and Tissue Natural Buffering Poster14th International Conference on Progress in Vaccination Against Cancer (PIVAC-14), 2014
-
Aug 2012Feasibility Study of a Portable Kit for Chagas-Mazza Disease Diagnosis and Data Centralization PosterHigh-Performance Computing Latin America Symposium, Buenos Aires, Argentina, 2012
-
Aug 2012Feasibility Study of a Portable Kit for Chagas-Mazza Disease Diagnosis and Data Centralization TalkHPC-Day 2012 — 41 JAIIO, University of La Plata · La Plata, Argentina · Invited
Some metadata, abstracts, and links on this page were AI-assisted and may contain inaccuracies.