2023 Mathematical Sciences Publications and Grants

*Indicates student co-author.

Aksoy, Asuman G. and Daniel Akech Thoing. “Equality of Degrees of Compactness: Schauder's Theorem and S-Numbers.” Communications of the Korean Mathematical Society, vol. 38, no. 4, 2023, pp. 1127-1130.

Abstract: We investigate an extension of Schauder's theorem by studying the relationship between various s-numbers of an operator T and its adjoint T∗. We have three main results. First, we present a new proof that the approximation number of T and T∗ are equal for compact operators. Second, for non-compact, bounded linear operators from X to Y, we obtain a relationship between certain s-numbers of T and T∗ under natural conditions on X and Y. Lastly, for non-compact operators that are compact with respect to certain approximation schemes, we prove results for comparing the degree of compactness of T with that of its adjoint T∗.


Aksoy, Asuman G., Francesca Arici, M. Eugenia Celorrio, and Pamela Gorkin. “Decompostable Blaschke products of degree 2n .” Transactions of the American Mathematical Society, vol. 376, 2023, pp. 6341-6369.

Abstract: We study the decomposability of a finite Blaschke product B of degree 2n into n degree-2 Blaschke products, examining the connections between Blaschke products, Poncelet’s theorem, and the monodromy group. We show that if the numerical range of the compression of the shift operator, W(SB), with B a Blaschke product of degree n, is an ellipse, then B can be written as a composition of lower-degree Blaschke products that correspond to a factorization of the integer n. We also show that a Blaschke product of degree 2n with an elliptical Blaschke curve has at most n distinct critical values, and we use this to examine the monodromy group associated with a regularized Blaschke product B. We prove that if B can be decomposed into n degree-2 Blaschke products, then the monodromy group associated with B is the wreath product of ncyclic groups of order 2. Lastly, we study the group of invariants of a Blaschke product B of order 2n when B is a composition of n Blaschke products of order 2.


Aksoy, Asuman G. “Lethargic Approximation: Banach and Frechet Spaces Forming Bernstein Pairs.” Bulletin of Nihon University, Liberal Arts, no. 98, 2023, pp. 7-15.

Abstract: In this paper, we examine the Bernstein's Lethargy Theorem in the context of Banach and Frechet spaces and define Bernstein pairs. We introduce conditions under which a pair of Banach spaces form a Bernstein pair. Some open questions relating Bernstein pairs for Frechet spaces are also presented.


Aksoy, Asuman. Review of “Gel’fand Widths of Sobolev Classes of Functions in the Average Setting,” by Yuqi Liu, Huan Li, and Xuehua Li. MathSciNet Mathematical Reviews, 2023, MR4541465.

Abstract: Smooth approximation of mappings with rank derivative at most one.

Abstract: Both Reviews can be reached from MathSCiNet database Math Reviews of Amer. Math. Soc.

Cannon, Sarah, Ari Goldbloom-Helzner, Varun Gupta, JN Matthews, and Bhushan Suwal. “Voting Rights, Markov Chains, and Optimization by Short Bursts.” Methodology and Computing in Applied Probability, vol. 25, 2023, article no. 36.

Abstract: Finding outlying elements in probability distributions can be a hard problem. Taking a real example from Voting Rights Act enforcement, we consider the problem of maximizing the number of simultaneous majority-minority districts in a political districting plan. An unbiased random walk on districting plans is unlikely to find plans that approach this maximum. A common search approach is to use a biased random walk: preferentially select districting plans with more majority-minority districts. Here, we present a third option, called short bursts, in which an unbiased random walk is performed for a small number of steps (called the burst length), then re-started from the most extreme plan that was encountered in the last burst. We give empirical evidence that short-burst runs outperform biased random walks for the problem of maximizing the number of majority-minority districts, and that there are many values of burst length for which we see this improvement. Abstracting from our use case, we also consider short bursts where the underlying state space is a line with various probability distributions, and then explore some features of more complicated state spaces and how these impact the effectiveness of short bursts.


Cannon, Sarah. “Irreducibility of Recombination Markov Chains in the Triangular Lattice.” SIAM Conference on Applied and Computational Discrete Algorithms, 2023, pp. 98-108.

Abstract: In the United States, regions (such as states or counties) are frequently divided into districts for the purpose of electing representatives. How the districts are drawn can have a profound effect on who's elected, and drawing the districts to give an advantage to a certain group is known as gerrymandering. It can be surprisingly difficult to detect when gerrymandering is occurring, but one algorithmic method is to compare a current districting plan to a large number of randomly sampled plans to see whether it is an outlier. Recombination Markov chains are often used to do this random sampling: randomly choose two districts, consider their union, and split this union up in a new way. This approach works well in practice and has been widely used, including in litigation, but the theory behind it remains underdeveloped. For example, it's not known if recombination Markov chains are irreducible, that is, if recombination moves suffice to move from any districting plan to any other.

Irreducibility of recombination Markov chains can be formulated as a graph problem: for a planar graph G, is the space of all partitions of G into κ connected subgraphs (κ districts) connected by recombination moves? While the answer is yes when districts can be as small as one vertex, this is not realistic in real-world settings where districts must have approximately balanced populations. Here we fix district sizes to be κ1 ± 1 vertices, κ2 ± 1 vertices,… for fixed κ1, κ2,…, a more realistic setting. We prove for arbitrarily large triangular regions in the triangular lattice, when there are three simply connected districts, recombination Markov chains are irreducible. This is the first proof of irreducibility under tight district size constraints for recombination Markov chains beyond small or trivial examples. The triangular lattice is the most natural setting in which to first consider such a question, as graphs representing states/regions are frequently triangulated. The proof uses a sweep-line argument, and there is hope it will generalize to more districts, triangulations satisfying mild additional conditions, and other redistricting Markov chains.


Cannon, Sarah. Review of All the Math You Missed (But Need to Know for Graduate School), Thomas A. Garrit. Notices of the American Mathematical Society, June/July 2023, pp. 973-975.

Abstract: A review of the textbook 'All the Math You Missed (But Need to Know for Graduate School),' 2nd Edition, by Thomas Garrity. It highlights the importance of books such like this one for levelling the playing field and helping to be sure all students, regardless of background, have the opportunity to succeed in mathematics graduate school.

Forst, Maxwell and Lenny Fukshansky. “On Zeros of Multilinear Polynomials.” Journal of Number Theory, vol. 245, 2023, pp. 169-186.

Abstract: We consider multivariable polynomials over a fixed number field, linear in some of the variables. For a system of such polynomials satisfying certain technical conditions we prove the existence of search bounds for simultaneous zeros with respect to height. For a single such polynomial, we prove the existence of search bounds with respect to height for zeros lying outside of a prescribed algebraic set. We also obtain search bounds in the case of homogeneous multilinear polynomials, which are related to a so-called “sparse” version of Siegel's lemma. Among the tools we develop are height inequalities that are of some independent interest.


Fukshansky, Lenny and Alexander Hsu. “Covering Point-Sets with Parallel Hyperplanes and Sparse Signal Recovery.” Discrete & Computational Geometry, vol. 69, 2023, pp. 919-930.

Abstract: We give a new deterministic construction of integer sensing matrices that can be used for the recovery of integer-valued signals in compressed sensing. This is a family of 𝑛×𝑑 integer matrices, 𝑑𝑛, with bounded sup-norm and the property that no column vectors are linearly dependent, 𝑛. Further, if 𝑜(log𝑛) then 𝑑/𝑛→∞ as 𝑛→∞. Our construction comes from particular sets of difference vectors of point-sets in ℝ𝑛 that cannot be covered by few parallel hyperplanes. We construct examples of such sets on the 0,±1 grid and use them for the matrix construction. We also show a connection of our constructions to a simple version of the Tarski’s plank problem.


Fukshansky, Lenny and Camilla Hollanti. “Euclidean lattices: theory and applications.” Communications in Mathematics, vol. 31, no. 2, 2023, pp. 251-263.

Abstract: In this editorial survey we introduce the special issue of the journal Communications in Mathematics on the topic in the title of the article. Our main goal is to briefly outline some of the main aspects of this important area at the intersection of theory and applications, providing the context for the articles showcased in this special issue.


Fukshansky, Lenny and David Kogan. “On Average Coherence of Cyclotomic Lattices.” Communications in Mathematics, vol. 31, no. 1, 2023, pp. 57-72.

Abstract: We introduce maximal and average coherence on lattices by analogy with these notions on frames in Euclidean spaces. Lattices with low coherence can be of interest in signal processing, whereas lattices with high orthogonality defect are of interest in sphere packing problems. As such, coherence and orthogonality defect are different measures of the extent to which a lattice fails to be orthogonal, and maximizing their quotient (normalized for the number of minimal vectors with respect to dimension) gives lattices with particularly good optimization properties. While orthogonality defect is a fairly classical and well-studied notion on various families of lattices, coherence is not. We investigate coherence properties of a nice family of algebraic lattices coming from rings of integers in cyclotomic number fields, proving a simple formula for their average coherence. We look at some examples of such lattices and compare their coherence properties to those of the standard root lattices.

Huber, Mark and Gizem Karaali. “Mathematics and Society.” Journal of Humanistic Mathematics, vol. 13, issue 2, 2023, pp. 1-3.

Abstract: This editorial for the July issue of the Journal of Humanistic Mathematics looks at the questions of how mathematics integrates with society.


Huber, Mark and Gizem Karaali “Where Does Mathematics Come From? Really, Where?Journal of Humanistic Mathematics, vol. 13, issue 1, 2023, pp. 1-3.

Abstract: This editorial for the January issue of the Journal of Humanistic Mathematics weaves a theme of how mathematics is created.


Huber, Mark. Probability: Lectures and Labs 3rd Edition (Data Science Adventures). Independent, 2023

Abstract: This text covers a one semester course in probability for students who have had linear algebra and calculus. Two-thirds of the text is devoted to a traditional lecture format, while one-third is given over to laboratory experiences using R that help students to develop their probabilistic intuition.

Hyok, Pak Il, Ri Jin Hyok, Ri Chol Ho, and Mike Izbicki. “Intra Cell Co-Channel Interference Mitigation in LTE Heterogeneous Network.” Proceedings of the 2023 15th International Conference on Computer Modeling and Simulation, June 2023, pp. 212-217.

Abstract: Indoor areas often have poor cellphone reception, especially when the building is made out of concrete. This reception can be improved by deploying small cellphone base stations called femtocells inside the building. The main challenge with deploying these femtocells is interference. Users located near many femtocells, or between the femtocell and the main base station, will see wireless traffic from each of these sources. Existing solutions to this problem divide the wireless spectrum in each cell so that interference is minimized. In this paper, we propose a simple new way of dividing the cell into three areas: the cell center area (CCA), cell middle area (CMA), and cell edge area (CEA). Each region has different frequency allocation policies that result in minimal interference between cells. We compare our method against the strict fractional frequency reuse, soft fractional frequency resuse, and fractional frequency reuse 3 schemes. We find better performance in both single and multi-cell networks.


Wang, Yujie and Mike Izbicki. “DocSplit: Simple Contrastive Pretraining for Large Document Embeddings.” Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 14190-14196.

Abstract: Existing model pretraining methods only consider local information. For example, in the popular token masking strategy, the words closer to the masked token are more important for prediction than words far away. This results in pretrained models that generate high-quality sentence embeddings, but low-quality embeddings for large documents. We propose a new pretraining method called DocSplit which forces models to consider the entire global context of a large document. Our method uses a contrastive loss where the positive examples are randomly sampled sections of the input document, and negative examples are randomly sampled sections of unrelated documents. Like previous pretraining methods, DocSplit is fully unsupervised, easy to implement, and can be used to pretrain any model architecture. Our experiments show that DocSplit outperforms other pretraining methods for document classification, few shot learning, and information retrieval tasks.


External Grant: Izbicki, Mike. Google Cloud Credits, 2023, $5,000.

Abstract: Google awarded me these funds to use their cloud infrastructure for training machine learning models. The research will focus on better techniques for languages that do not have a large corpus of training data.

Kao, Chiu-Yen, Braxton Osting, and édouard Oudet “Computational Approaches for Extremal Geometric Eigenvalue Problems.” Handbook of Numerical Analysis, vol. 24, 2023, pp. 377-406.

Abstract: In an extremal eigenvalue problem, one considers a family of eigenvalue problems, each with discrete spectra, and extremizes a chosen eigenvalue over the family. In this chapter, we consider eigenvalue problems defined on Riemannian manifolds and extremize over the metric structure. For example, we consider the problem of maximizing the principal Laplace–Beltrami eigenvalue over a family of closed surfaces of fixed volume. Computational approaches to such extremal geometric eigenvalue problems present new computational challenges and require novel numerical tools, such as the parameterization of conformal classes and the development of accurate and efficient methods to solve eigenvalue problems on domains with nontrivial genus and boundary. We highlight recent progress on computational approaches for extremal geometric eigenvalue problems, including (i) maximizing Laplace–Beltrami eigenvalues on closed surfaces and (ii) maximizing Steklov eigenvalues on surfaces with boundary.


Kao, Chiu-Yen, Braxton Osting, and édouard Oudet. “Harmonic Functions on Finitely Connected Tori.” SIAM Journal on Numerical Analysis, vol 61, issue 6, 2023, pp. 2795-2812. 

Abstract: In this paper, we prove a logarithmic conjugation theorem on finitely connected tori. The theorem states that a harmonic function can be written as the real part of a function whose derivative is analytic and a finite sum of terms involving the logarithm of the modulus of a modified Weierstrass sigma function. We implement the method using arbitrary precision and use the result to find approximate solutions to the Laplace problem and the Steklov eigenvalue problem. Using a posteriori estimation, we show that the solution of the Laplace problem on a torus with a few holes has error less than 10-100 using a few hundred degrees of freedom and the Steklov eigenvalues have similar error.


Kao, Chiu-Yen, Braxton Osting, and Jackson C. Turner. “Flat Tori with Large Laplacian Eigenvalues in Dimensions up to Eight.” SIAM Journal on Applied Algebra and Geometry, vol. 7, issue 1, 2023, pp. 172-193.

Abstract: We consider the optimization problem of maximizing the k th Laplacian eigenvalue, λk, over flat d-dimensional tori of fixed volume. For k=1, this problem is equivalent to the densest lattice sphere packing problem. For larger k, this is equivalent to the NP-hard problem of finding the d-dimensional (dual) lattice with the longest k th shortest lattice vector. As a result of extensive computations, for d≤8, we obtain a sequence of flat tori, Tk,d, each of volume one, such that the k th Laplacian eigenvalue of Tk,d is very large; for each (finite) k the k th eigenvalue exceeds the value in (the k→∞ asymptotic) Weyl’s law by a factor between 1.54 and 2.01, depending on the dimension. Stationarity conditions are derived and numerically verified for Tk,d, and we describe the degeneration of the tori as k→∞.

Owusu, Emmanuel, Nahrain M. Shasteen, G. Lynn Mitchell, Melissa D. Bailey, Chiu-Yen Kao, Andrew J. Toole, Kathryn Richdale, Marjean T. Kulp. “Impact of Accommodative Insufficiency and Accommodative/Vergence Therapy on Ciliary Muscle Thickness in the Eye.” Ophthalmic and Physiological Optics, vol. 43, issue 5, 2023, pp. 947-953.

Abstract:

Purpose
Recent evidence suggests that the ciliary muscle apical fibres are most responsive to accommodative load; however, the structure of the ciliary muscle in individuals with accommodative insufficiency is unknown. This study examined ciliary muscle structure in individuals with accommodative insufficiency (AI). We also determined the response of the ciliary muscle to accommodative/vergence therapy and increasing accommodative demands to investigate the muscle's responsiveness to workload.

Methods
Subjects with AI were enrolled and matched by age and refractive error with subjects enrolled in another ciliary muscle study as controls. Anterior segment optical coherence tomography was used to measure the ciliary muscle thickness (CMT) at rest (0D), maximum thickness (CMTMAX) and over the area from 0.75 mm (CMT0.75) to 3 mm (CMT3) posterior to the scleral spur of the right eye. For those with AI, the ciliary muscle was also measured at increasing levels of accommodative demand (2D, 4D and 6D), both before and after accommodative/vergence therapy.

Results
Sixteen subjects with AI (mean age = 17.4 years, SD = 8.0) were matched with 48 controls (mean age = 17.8 years, SD = 8.2). On average, the controls had 52–72 μm thicker ciliary muscles in the apical region at 0D than those with AI (p = 0.03 for both CMTMAX and CMT 0.75). Differences in thickness between the groups in other regions of the muscle were not statistically significant. After 8 weeks of accommodative/vergence therapy, the CMT increased by an average of 22–42 μm (p ≤ 0.04 for all), while AA increased by 7D (p < 0.001).

Conclusions
This study demonstrated significantly thinner apical ciliary muscle thickness in those with AI and that the ciliary muscle can thicken in response to increased workload. This may explain the mechanism for improvement in signs and symptoms with accommodative/vergence therapy.

Gilroy, Will and Sam Nelson. “Bilinear Enhancements of Quandle Invariants.” Journal of Knot Theory and Its Ramifications, vol. 32, no. 5, 2023, 2350039.

Abstract: We enhance the quandle counting invariants of oriented classical and virtual links using a construction similar to quandle modules but inspired by symplectic quandle operations rather than Alexander quandle operations. Given a finite quandle 𝑋 and a vector space 𝑉 over a field, sets of bilinear forms on 𝑉 indexed by pairs of elements of 𝑋 satisfying certain conditions yield new enhanced multiset- and polynomial-valued invariants of oriented classical and virtual links. We provide examples to illustrate the computation of the invariants and to show that the enhancement is proper.


Jeong, Suhyeon, Jieon Kim, and Sam Nelson. “Psybrackets, pseudoknots, and Singular Knots.” Journal of Knot Theory and Its Ramifications, vol. 32, no. 1, 2023, 2350001.

Abstract: We introduce algebraic structures known as psybrackets and use them to define invariants of pseudoknots and singular knots and links. Psybrackets are Niebrzydowski tribrackets with additional structure inspired by the Reidemeister moves for pseudoknots and singular knots. Examples and computations are provided.


Joung, Yewon and Sam Nelson. “Bikei Module Invariants of Unoriented Surface-Links.”

Journal of Knot Theory and Its Ramifications, vol. 32, no. 10, 2023, 2350065.

Abstract: We extend our previous work from arXiv:1903.06863 on biquandle module invariants of oriented surface-links to the case of unoriented surface-links using bikei modules. The resulting infinite family of enhanced invariants proves be effective at distinguishing unoriented and especially non-orientable surface-links; in particular, we show that these invariants are more effective than the bikei homset cardinality invariant alone at distinguishing non-orientable surface-links. Moreover, as another application we note that our previous biquandle modules which do not satisfy the bikei module axioms are capable of distinguishing different choices of orientation for orientable surface-links as well as classical and virtual links.


Nelson, Sam, and Fletcher Nickerson “Polynomial Invariants of Tribrackets in Knot Theory.” Osaka Journal of Mathematics, vol. 20, 2023, pp. 323-332.

Abstract: We introduce a six-variable polynomial invariant of Niebrzydowski tribrackets analogous to quandle, rack and biquandle polynomials. Using the subtribrackets of a tribracket, we additionally define subtribracket polynomials and establish a sufficient condition for isomorphic subtribrackets to have the same polynomial regardless of their embedding in the ambient tribracket. As an application, we enhance the tribracket counting invariant of knots and links using subtribracket polynomials and provide examples to demonstrate that this enhancement is proper. 


Nelson, Sam and Migiwa Sakurai. “Biquandle Arrow Weight Enhancements.” International Journal of Mathematics, vol. 34, no. 8, 2023, 2350046.\

Abstract: We introduce a new infinite family of enhancements of the biquandle homset invariant called biquandle arrow weights. These invariants assign weights in an abelian group to intersections of arrows in a Gauss diagram representing a classical or virtual knot depending on the biquandle colors associated to the arrows. We provide examples to show that the enhancements are nontrivial and proper, i.e., not determined by the homset cardinality.;

Friedberg, Rina, Michael Baiocchi, Evan Rosenman, Mary Amuyunzu-Nyamongo, Gavin Nyairo, and Clea Sarnquist. “Mental Health and Gender-Based Violence: An Exploration of Depression, PTSD, and Anxiety Among Adolescents in Kenyan Informal Settlements Participating in an Empowerment Intervention.” PLoS One, vol. 18, no. 3, 2023, e0281800.

Abstract:

Objective:
This study examines the prevalence of depression, anxiety, and post-traumatic stress disorder (PTSD) among adolescents attending schools in several informal settlements of Nairobi, Kenya. Primary aims were estimating prevalence of these mental health conditions, understanding their relationship to gender-based violence (GBV), and assessing changes in response to an empowerment intervention.

Methods:
Mental health measures were added to the final data collection point of a two-year randomized controlled trial (RCT) evaluating an empowerment self-defense intervention. Statistical models evaluated how past sexual violence, access to money to pay for a needed hospital visit, alcohol use, and self-efficacy affect both mental health outcomes as well as how the intervention affected female students’ mental health.

Findings:
Population prevalence of mental health conditions for combined male and female adolescents was estimated as: PTSD 12.2% (95% confidence interval 10.5–15.4), depression 9.2% (95% confidence interval 6.6–10.1) and anxiety 17.6% (95% confidence interval 11.2% - 18.7%). Female students who reported rape before and during the study-period reported significantly higher incidence of all mental health outcomes than the study population. No significant differences in outcomes were found between female students in the intervention and standard-of-care (SOC) groups. Prior rape and low ability to pay for a needed hospital visit were associated with higher prevalence of mental health conditions. The female students whose log-PTSD scores were most lowered by the intervention (effects between -0.23 and -0.07) were characterized by high ability to pay for a hospital visit, low agreement with gender normative statements, larger homes, and lower academic self-efficacy.

Conclusion:
These data illustrate a need for research and interventions related to (1) mental health conditions among the young urban poor in low-income settings, and (2) sexual violence as a driver of poor mental health, leading to a myriad of negative long-term outcomes.


Kenny, Christopher T., Shiro Kuriwaki, Cory McCartan, Evan Rosenman, Tyler Simko, and Kosuke Imai. “Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System.” Harvard Data Science Review, Special Issue 2, 2023.

Abstract: In 'Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy,' boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices.

Rosenman, Evan, Cory McCartan, and Santiago Olivella. “Recalibration of Predicted Probabilities Using the ‘Logit Shift’: Why Does It Work, and When Can It Be Expected to Work Well?Political Analysis, vol. 31, no. 4, 2023, pp. 651-661.

Abstract: The output of predictive models is routinely recalibrated by reconciling low-level predictions with known quantities defined at higher levels of aggregation. For example, models predicting vote probabilities at the individual level in U.S. elections can be adjusted so that their aggregation matches the observed vote totals in each county, thus producing better-calibrated predictions. In this research note, we provide theoretical grounding for one of the most commonly used recalibration strategies, known colloquially as the “logit shift.” Typically cast as a heuristic adjustment strategy (whereby a constant correction on the logit scale is found, such that aggregated predictions match target totals), we show that the logit shift offers a fast and accurate approximation to a principled, but computationally impractical adjustment strategy: computing the posterior prediction probabilities, conditional on the observed totals. After deriving analytical bounds on the quality of the approximation, we illustrate its accuracy using Monte Carlo simulations. We also discuss scenarios in which the logit shift is less effective at recalibrating predictions: when the target totals are defined only for highly heterogeneous populations, and when the original predictions correctly capture the mean of true individual probabilities, but fail to capture the shape of their distribution.


Rosenman, Evan, Guillaume Basse, Art B. Owen, and Mike Baiocchi. “Combining Observational and Experimental Datasets Using Shrinkage Estimators.” Biometrics, vol. 79, issue 4, 2023, pp. 2961-2973.

Abstract: We consider the problem of combining data from observational and experimental sources to draw causal conclusions. To derive combined estimators with desirable properties, we extend results from the Stein shrinkage literature. Our contributions are threefold. First, we propose a generic procedure for deriving shrinkage estimators in this setting, making use of a generalized unbiased risk estimate. Second, we develop two new estimators, prove finite sample conditions under which they have lower risk than an estimator using only experimental data, and show that each achieves a notion of asymptotic optimality. Third, we draw connections between our approach and results in sensitivity analysis, including proposing a method for evaluating the feasibility of our estimators.


Rosenman, Evan and Luke Miratrix. “Designing Experiments Toward Shrinkage Estimation.” Electronic Journal of Statistics, vol. 17, no. 2, 2023, pp. 3406-3442.

Abstract: How can increasingly available observational data be used to improve the design of randomized controlled trials (RCTs)? We seek to design a prospective RCT, with the intent of using an Empirical Bayes estimator to shrink the causal estimates from our trial toward causal estimates obtained from an observational study. We ask: how might we design the experiment to better complement the observational study in this setting?

We show that the risk of such shrinkage estimators can be computed efficiently via numerical integration. We then propose three algorithms for determining the best allocation of units to strata given the estimator’s plannned use: Neyman allocation; a “naïve” design assuming no unmeasured confounding in the observational study; and a robust design accounting for the imperfect parameter estimates we would obtain from the observational study with unmeasured confounding. We propose guardrails on the designs, so that our experiment could be reasonably analyzed without shrinkage if desired.

We demonstrate the viability of these experimental designs through a simulation study involving a rare, binary outcome. Lastly, we deploy our methods on real data from the Women’s Health Initiative, a 1991 study estimating the health effects of hormone therapy on postmenopausal women. In particular, we determine how many units should be allocated to each treatment arm in each stratum of interest in order to maximally reduce estimation risk given the planned use of the shrinkage estimator. We find improved design provides further benefits over and above the benefit of the shrinkage estimator itself.


Rosenman, Evan, Rina Friedberg, and Mike Baiocchi. “Robust Designs for Prospective Randomized Trials Surveying Sensitive Topics.” American Journal of Epidemiology, vol. 192, issue 5, 2023, pp. 812-820.

Abstract: We consider the problem of designing a prospective randomized trial in which the outcome data will be self-reported and will involve sensitive topics. Our interest is in how a researcher can adequately power her study when some respondents misreport the binary outcome of interest. To correct the power calculations, we first obtain expressions for the bias and variance induced by misreporting. We model the problem by assuming each individual in our study is a member of one “reporting class”: a true-reporter, false-reporter, never-reporter, or always-reporter. We show that the joint distribution of reporting classes and “response classes” (characterizing individuals’ response to the treatment) will exactly define the error terms for our causal estimate. We propose a novel procedure for determining adequate sample sizes under the worst-case power corresponding to a given level of misreporting. Our problem is motivated by prior experience implementing a randomized controlled trial of a sexual-violence prevention program among adolescent girls in Kenya.


Rosenman, Evan, Santiago Olivella, and Kosuke Imai. “Race and Ethnicity Data for First, Middle, and Surnames.” Scientific Data, vol. 10, 2023, 299.

Abstract: We provide the largest compiled publicly available dictionaries of first, middle, and surnames for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six U.S. Southern States that collect self-reported racial data upon voter registration. Our data cover the racial make-up of a larger set of names than any comparable dataset, containing 136 thousand first names, 125 thousand middle names, and 338 thousand surnames. Individuals are categorized into five mutually exclusive racial and ethnic groups — White, Black, Hispanic, Asian, and Other — and racial/ethnic probabilities by name are provided for every name in each dictionary. We provide both probabilities of the form ℙ(race|name) and ℙ(name|race), and conditions under which they can be assumed to be representative of a given target population. These conditional probabilities can then be deployed for imputation in a data analytic task for which self-reported racial and ethnic data is not available.

Flapan, Erica, Alireza Mashaghi, and Helen Wong. “A Tile Model of Circuit Topology for Self-Entangled Biopolymers.” Nature: Scientific Reports, vol. 13, 2023, 8889.

Abstract: Building on the theory of circuit topology for intra-chain contacts in entangled proteins, we introduce tiles as a way to rigorously model local entanglements which are held in place by molecular forces. We develop operations that combine tiles so that entangled chains can be represented by algebraic expressions. Then we use our model to show that the only knot types that such entangled chains can have are and connected sums of these knots. This includes all proteins knots that have thus far been identified.


External Grant: Wong, Helen, Erica Flapan, Eric Rawdon, and Joanna Sulkowska. “New Topological Models of Knot Formation in Proteins.” Structured Quartet Research Ensemble (SQuaRE), American Institute of Mathematics, 2023-2025.

Abstract: The goal of this project is to develop topological techniques to model the folding of proteins into knots, links, lassos, and other topologically complex structures.


External Grant: Wong, Elen. “RUI: Pure and Applied Knot Theory: Skeins, Hyperbolic Volumes, and Biopolymers.” National Science Foundation Standard Grant, NSF program MPS/DMS-Topology. 2023-2026, $281,433.

Abstract: Knot theory is the mathematical study of entanglement of loops up to continuous deformation. One can create a knot by taking an entangled string and connecting the endpoints, and two knots are equivalent if one can continuously deform one to the other, for example by bending, stretching, and passing strands inside and through others, but without cutting or breaking the string in any way. This project considers both theoretical problems and applications of mathematical knot theory. One group of problems studies a family of invariants of knots related to quantum field theory from physics. More specifically, the project seeks to understand how the quantum invariant of a knot detects geometric properties of the knot and the 3-dimensional spaces that can be associated with it. The mathematical techniques from this research has potential applications to mathematical physics and theoretical topological quantum computing. Another group of problems concerns applications to the study of knotted proteins and other biopolymers, some of which are known to be associated to various diseases. The project uses knot theory techniques to develop a model that can be used to quantify and to relate local topological complexity with biophysical processes. The model can also be used to potentially design synthetic biopolymers with special biophysical properties. The project includes a number of research problems suitable for collaboration with undergraduate students, as well as outreach and dissemination activities that seek to increase interest in mathematics more generally. The PI has successfully involved undergraduate students in similar research in the past and will continue to advise and encourage students to continue careers in mathematics and related areas.

The research is split into three parts, two seek to connect quantum topology with hyperbolic geometry and one applies knot theory to molecular biology. One project concerns a version of the Volume Conjecture based the theory of the Kauffman bracket skein algebra from quantum topology and its relationship to the Teichmuller space of a surface from hyperbolic geometry. A second project studies algebraic and geometric properties of a generalization of the Kauffman bracket algebra which is related to the decorated Teichmuller space of a surface with punctures. A third project involves a collaboration with a biophysicist to study local entanglements that are held tightly in place by molecular forces in biopolymers. The proposed knot-theoretic model would give a description of such local entanglements, allowing one to quantify and measure changes in the local topological complexity of biopolymers in experiments.