A Generalization of the Parameterized Expectation Algorithm

Introduction

This blog post is about my work on a generalization of the Parameterized Expectations Algorithm, available here.

The Parameterized Expectations Algorithm (PEA) is a classic computational approach to numerically “solve” economic models with rational expectations, i.e. finding an approximate solution. Usually, to solve economic models of this type, one has to find a policy function (e.g. how much to consume today, given a current level of capital) that satisfies a functional equation that holds in expectation (e.g. the Euler equation). The PEA flips this idea on its head: if we know the expectation term, we can recover the unknown policy function. For several reasons, this works better this way.

In my view, the paper A Generalization of the Parameterized Expectations Algorithm can be read from two angles:

TL;DR 1 [Practical side]: This paper is a variance-reduced version of standard PEA that uses multiple innovation draws per state vector. The optimal number of innovation draws (given a fixed computational budget constraint) is calculated in closed form using only a small number of hyperparameters.

TL;DR 2 [Theoretical side]: This paper shows that there is a deep connection between the modern neural network (NN) computational approaches (e.g. Deep Equilibrium Nets; All-in-One Operator; bc-MC Operator and the old-school PEA. This connection can be summarized as: NN + bc-MC operator + backpropagation ≈ multi-innovation-draws PEA + OLS.

On the one hand, the result of TL;DR 2 is expected, because both approaches share many features (Monte Carlo integration, focus on the ergodic set for the state vector). On the other hand, this could be surprising because there are also substantial differences: NNs are highly non-linear and their parameters are tuned using gradient descent and backpropagation, while the PEA typically relies on a linear or log-linear model, where the parameters are calculated using linear or non-linear least squares.

I. Theory

Setting

In many cases, solving an economic model involves finding the parameter vector $\theta$ that minimizes a loss function of the form:

$$ L(\theta) = E_{s} \Big( \Big[ E_{\varepsilon}(f(s, \varepsilon | \theta )) \Big]^2 \Big) $$

where $s$ is a state vector, $\varepsilon$ a zero-mean random innovation vector and $f(.)$ a function that depends on the model. The expectation with respect to $\varepsilon$ is the usual expectation that shows up in Euler equations. The expectation with respect to the state vector $s$ is a bit more unusual, because we usually think about numerically solving an economic model using a fixed grid for $s$. Here we are saying: “I want to find $\theta$ such that the Euler equation holds, for all economically relevant values of the state vector $s$“, in the sense of minimizing the expected squared Euler residual over the ergodic distribution of states.

Focusing on the ergodic distribution is a nice feature because large parts of the state space are actually never visited or visited with very small probability. Here, we enforce accuracy where the economy actually spends time, rather than uniformly over an arbitrary grid.

With the NN computational approaches, $\theta$ is usually the parameter vector of a large NN (sometimes called “weights and biases”), that outputs the endogenous decision of agents (e.g. consumption functions of $A$ agents). The parameter vector $\theta$ can be calculated using gradient descent, using the update rule:

$$ \theta_{i+1} = \theta_{i} - \gamma \nabla_{\theta} L(\theta_{i}) $$

Where the gradient $\nabla_{\theta} L(\theta_{i})$ can be efficiently calculated using automatic differentiation and backpropagation, as long as you use Pytorch, Tensorflow, or something similar.

Single or double Monte Carlo?

The loss function $L(\theta)$ contains two nested expectations. The “outside” expectation $E_{s}$ is standard and we approximate it using Monte Carlo integration (i.e. drawing $M$ samples and calculating the average: $E_{s} \approx 1/M \sum_{i=1}^{M}$).

There is a subtle point for the “inside” (conditional on $s$) expectation $E_{\varepsilon}$. If the exogenous innovation vector $\varepsilon$ has discrete support (e.g. discrete Markov chain), $E_{\varepsilon}$ can be calculated exactly. For continuous distributions, one may use a deterministic quadrature scheme (e.g. Gauss-Hermite quadrature).

However, if $\varepsilon$ is high-dimensional and/or continuous, one may instead use Monte Carlo integration to approximate the “inside” expectation operator ($E_{\varepsilon} \approx 1/N \sum_{i=1}^{N}$). The subtlety comes from the fact that taking the square of the empirical mean creates a bias, because of the variance formula $Var(X) = E(X^2) - E(X)^2 \Leftrightarrow E(X^2) = E(X)^2 + Var(X)$. A simple workaround consists in taking two independent copies of $\varepsilon$, denoted by $\varepsilon^1$ and $\varepsilon^2$, and using the fact that for independent variables, the product of expectations is the expectation of the product: $E[X_1 \times X_2] = E[X_1] \times E[X_2] = E[X] \times E[X] = E[X]^2$. This is the trick behind the All-in-One Operator, which, applied to the loss function above, gives:

$$ L_{M}(\theta) = \frac{1}{M} \sum_{m=1}^{M} f(s_m, \varepsilon^{1}_m | \theta ) f(s_m, \varepsilon^{2}_m | \theta ) $$

But nothing prevents you from using more independent copies of $\varepsilon$. For instance, using three copies, one has $ \frac{1}{3}E[X_1 \times X_2] + \frac{1}{3}E[X_1 \times X_3] + \frac{1}{3}E[X_2 \times X_3] = \frac{1}{3} E[X]^2 + \frac{1}{3} E[X]^2 + \frac{1}{3} E[X]^2 = E[X]^2$. One may even take $N$ independent copies of $\varepsilon$, in which case you get the bc-MC Operator:

$$ L_{M,N}(\theta) = \frac{2}{MN(N-1)} \sum_{m=1}^{M } \sum_{1\leq i < j}^{N} f(s_m, \varepsilon^{i}_m | \theta ) f(s_m, \varepsilon^{j}_m | \theta ) $$

Using more independent copies tends to reduce the integration error, at the cost of adding more computing time.

What about the PEA?

The PEA is easier to present using an example. Consider for instance a representative household choosing consumption $c_t$ and next period’s capital $k_{t+1}$ to maximize expected utility:

$$ \max_{(c_t,k_{t+1})_{t=0}^{\infty}} \; E_0 \sum_{t=0}^\infty \beta^t \frac{c_t^{1-\sigma} - 1}{1-\sigma} $$

subject to the resource constraint: $$ k_{t+1} = z_t k_t^{\alpha} - c_t + (1-\delta)k_t $$

and the law of motion for TFP: $$ \log(z_{t+1}) = \rho \log(z_t) + \varepsilon_{t+1} $$

The associated FOC is: $$ c_t^{-\sigma} = \beta E_t \big( c_{t+1}^{-\sigma} \left[\alpha z_{t+1} k_{t+1}^{\alpha-1} + 1-\delta \right] \big) $$

which can also be rewritten as:

$$ \frac{c_{t}^{-\sigma} - c_{ss}^{-\sigma}}{c_{ss}^{-\sigma}} = E_t \big( \beta \big(\frac{c_{t+1}}{c_{ss}} \big)^{-\sigma} \left[\alpha z_{t+1} k_{t+1}^{\alpha-1} + 1-\delta \right] - 1 \big) $$

Now define $E_t[\phi_{t+1}]$ as the right-hand side of the previous equation. That is,

$$ \phi_{t+1} \equiv \beta \big(\frac{c_{t+1}}{c_{ss}} \big)^{-\sigma} \left[\alpha z_{t+1} k_{t+1}^{\alpha-1} + 1-\delta \right] - 1 $$

Note that if we know the value of the conditional expectation, we can recover the implied current consumption using:

$c_t = c_{ss} (1 + E_t[\phi_{t+1}])^{-1/\sigma}$

Knowing the current consumption, we can iterate the model forward and keep repeating this process. Yet, we do not know $E_t[\phi_{t+1}]$, so it appears that we are stuck.

One solution is to assume that the conditional expectation is a linear function of some selected state variables: $$ \phi_{t+1}(\theta) = \theta_0 + \theta_1 \log k_t + \theta_2 \log z_t + \theta_3 (\log k_t)^2 + \theta_4 (\log z_t)^2 + \theta_5 \log k_t \times \log z_t + \epsilon_t $$

or more compactly written as: $$ \phi_{t+1}(\theta) = s_{t}^{T} \theta + \epsilon_t $$

The PEA then proceeds in two steps:

  1. For a given vector $\theta^{(i)}$, simulate the economy forward and store the data
  2. Given the data from step 1, update $\theta^{(i)}$ using a convex combination $\theta^{(i+1)} = \gamma \theta^{(i)} + (1 - \gamma) \theta^{OLS}$, where $\theta^{OLS}$ is the ordinary least squares estimate (OLS) of regressing $\phi_{t+1}(\theta^{(i)})$ on the state vector $s_t$.

This process is repeated until the parameter is no longer meaningfully updated: $d(\theta^{(i+1)}, \theta^{(i)}) \leq \text{tol}$, where $\text{tol}$ is a small real number, meaning that the actual guess for the conditional expectation matches the realized values.

For future reference, note that $\theta^{OLS}$ solves the normal equations:

$$ \frac{-2}{T} \sum_{t=1}^{T} s_t \Big( \phi_{t+1}(\theta^{(i)}) - s_t^{T} \theta^{OLS} \Big) = 0 $$

Back to the bc-MC operator

As said previously, when using NN, the gradient of the loss function can be easily and efficiently calculated using backpropagation, as long as one uses the appropriate software (e.g. Pytorch, Tensorflow). Let’s now consider what happens when we want to calculate the gradient $\nabla_{\theta} L_{M,N}(\theta)$ “manually”. But let’s do that under two “PEA-style” restrictions:

  1. $f(s_m, \varepsilon^{i}_m | \theta ) = g(s_m, \varepsilon^{i}_m | \theta) - h(s_m, | \theta)$,
  2. $h(s_m, | \theta) = s_m^T \theta$.

The first assumption is a separability assumption, which implies that when the loss function is equal to zero, $h(.)$ is the conditional expectation $ E_{ \varepsilon} [ g(s_m, \varepsilon^{i}_m | \theta) ]$. The second assumption asserts that this conditional expectation is a linear function of the state vector $s_m$. After some intermediate calculations (see the paper), one gets that the gradient is given by:

$$ \nabla_{\theta} L_{M,N}(\theta) = \frac{-2}{M} \sum_{m=1}^{M} s_m \Big( \Big[ \frac{1}{N} \sum_{i=1}^{N} g(s_m, \varepsilon^{i}_m | \theta) \Big]- s_m^{T} \theta^{OLS} \Big) + R_{M,N}(\theta) $$

Note that the first term is identical to OLS normal equations above, where the dependent variable is an average over $N$ realizations $g(s_m, .)$. If we draw a single innovation per realization of the state vector ($N=1$), we are back to standard PEA.

The second term $R_{M,N}(\theta)$ is a bit nasty. It captures the fact that we are aiming at a “moving target”: the conditional expectation is not a simple “fixed” independent variable; it also depends on the parameter $\theta$ we are trying to estimate. Fortunately, it is equal to zero in expectation at stationary points of the gradient ($\nabla_{\theta} L_{M,N}(\theta^{*}) = 0$). Practically, this means that we can focus on minimizing the first part of $\nabla_{\theta} L_{M,N}(\theta)$ (the “averaged OLS problem”), which is no more than a simple OLS regression of $y_m = \frac{1}{N} \sum_{i=1}^{N} g(s_m, \varepsilon^{i}_m | \theta)$ on $s_m$. Hence, we are back to the PEA, with the possibility of more accurately calculating the conditional expectation using more innovation draws.

Variance reduction under a computational budget constraint

At this stage, it is not necessarily obvious that using more innovation draws per state is really needed. If more accuracy is needed when estimating $\theta$ at the OLS stage, why don’t we simply simulate longer series in step 1 of the PEA? Or said differently, let’s increase $M$ and call it a day. Spoiler alert: the reason is that there are asymmetries between drawing more state and innovation vectors, which I discuss below.

First, note that drawing more state vectors is essentially a serial computation, because the state of the economy today depends on what happened yesterday. Instead, drawing more innovation vectors can be easily realized in parallel (given the current state, future realizations can be explored using a parallel for loop). This point is not even mentioned or used in the paper.

Second, there are asymmetric costs between simulating the economy and the OLS step. Simulating the economy forward with $M$ steps, while using $N$ independent innovation draws per state vector, has a computational complexity of $O(MN)$. The computational complexity of the OLS estimator with $M$ observations and $k$ explanatory variables is approximately $O(Mk^2)$. Hence, increasing the length of the simulation ($M$) comes with a penalty at the OLS stage, which is not the case for increasing $N$, because additional innovation draws are first averaged before being used at the OLS stage. This discussion implies the following timing model ($T$ in seconds) for the full simulation + OLS package:

$$ T = \alpha_0 + \alpha_{M} M + \alpha_{MN} M N $$

Third, in general the distribution of the innovation vector is known, while the ergodic distribution of the state vector is unknown and only sequentially approximated during the minimization process. This matters because with known distributions, we can use low-discrepancy sequences to approximate $E_{\varepsilon}$ instead of using simple random draws.

The above equation gives us a good model on how computing time changes with $M$ and $N$. The parameters $\alpha_{M}$ and $\alpha_{MN}$ can be estimated by varying $M$ and $N$ and using an OLS regression.

We also need to understand how the choice of $M$ and $N$ impacts the accuracy of our numerical solution. In the paper, under the assumption that the state vectors are i.i.d, I show that the expected average squared forecast error of the OLS model is proportional to:

$$ \frac{1}{N^{\alpha}} \frac{k}{M - k - 1} $$.

where $\alpha$ is between 1 (for standard random sampling) and 2 (low-discrepancy sequence in an ideal case).

To simplify things, let’s consider the random sampling case ($\alpha = 1$). Under that assumption, if one minimizes the expected average squared forecast error of the OLS model for a given computational budget $T$, one finds:

$$ M^{\star} = \sqrt{\frac{(T - \alpha_{0}) (k+1)}{\alpha_{M}}}$$ $$ N^{\star} = \frac{\alpha_{M}}{\alpha_{MN}} \big( \frac{M^{\star}}{k+1} - 1 \big)$$

where we have to round these numbers to the nearest positive integers. See the paper for the general case where $\alpha \neq 1$. The formula is more convoluted, but still manageable.

The equation for $M^{\star}$ makes sense. If the cost of the OLS step decreases ($\alpha_{M}$ lower), the optimal length of the simulation ($M^{\star}$) increases. If the linear model for the conditional expectation has more explanatory variables ($k$), the optimal length of the simulation must increase.

The equation for $N^{\star}$ is also intuitive. If the cost of the simulation step decreases ($\alpha_{MN}$ lower), the optimal length of the simulation ($N^{\star}$) increases.

Note that the special case “standard PEA” ($N^{\star}=1$) does occur when $\alpha_{M} \rightarrow 0$. But in general $N^{\star} \approx \frac{\alpha_{M}}{\alpha_{MN}} \frac{M^{\star}}{k+1} $ is often bigger than 1.

Extension to serially correlated state vectors

The i.i.d. assumption for the state vectors is a bit too restrictive. Often, data comes from serially correlated series. For instance, the logarithm of TFP is often assumed to be AR(1):

$$ \log(z_{t+1}) = \rho \log(z_{t}) + \varepsilon_{t+1} $$

with $ | \rho |< 1 $ and $\varepsilon$ an i.i.d. zero-mean Gaussian random variable. Under the assumption that the state vector $s_t$ has such an AR(1) structure, with persistence parameter $ | \rho |< 1 $, the new optimal value for $M$ is given by:

$$ M^{\star \star} = M^{\star}\frac{1 + \rho^2}{1 - \rho^2} $$

This point appears in the Online Appendix of the paper.

The serial correlation correction term $\frac{1 + \rho^2}{1 - \rho^2}$ increases the optimal value for $M$. This makes sense, because with serially correlated observations, each realization of $s_t$ is less information-rich (it is correlated with $s_{t-1}$). The larger the value for $|\rho|$, the bigger this correction term is.

Extension to log-linear models

Often, economic models are such that the logarithm of the conditional expectation is a linear function of the state vector. Or more generally, it is linear in a continuous transformation of the state vector denoted by $x_t$ (e.g. taking logs):

$E_{\varepsilon}\left[g(\boldsymbol{s}_t, \varepsilon) \mid \boldsymbol{s}_t \right] = \exp(\boldsymbol{x}_t’ \boldsymbol{\theta})$.

Fortunately, we can reuse the above framework, as discussed in the Online Appendix.

At the OLS regression step, we can simply regress $\log(y_m)$ on $x_t$, where $y_m = \frac{1}{N} \sum_{i=1}^{N} g(s_m, \varepsilon^{i}_m | \theta)$, as before.


II. Numerical example: a model with a borrowing constraint

The PEA gracefully handles economic models with inequality constraints, so let’s study a standard consumption-savings problem, complicated by an occasionally binding investment constraint. The trick is to treat the Lagrange multiplier on the occasionally binding investment constraint as a residual variable that can be adjusted ex post.

A central planner chooses consumption $c_t$ and next period’s capital $k_{t+1}$ to maximize expected lifetime utility

$$ \max_{(c_t, k_{t+1})_{t=0}^{\infty}} \; E_0 \sum_{t=0}^\infty \beta^t \frac{c_t^{1-\sigma}-1}{1-\sigma} $$

subject to:

1) a resource constraint: $$ k_{t+1} = z_t k_t^\alpha - c_t + (1-\delta)k_t, $$

2) a constraint on investment: $$ k_{t+1} \geq (1-\delta)k_t, $$

3) the law of motion for TFP: $$ \log(z_{t+1}) = \rho \log(z_t) + \varepsilon_{t+1}. $$

The associated FOCs are: $$ c_t^{-\sigma} - \mu_t = \beta E_t \big( c_{t+1}^{-\sigma} \left[\alpha z_{t+1} k_{t+1}^{\alpha-1} + 1-\delta \right] - \mu_{t+1} (1 - \delta) \big),
$$

$$ \mu_t ( k_{t+1} - (1-\delta)k_t) = 0 $$

$$ \mu_t \geq 0 $$

Expectation function $E_t[\phi_{t+1}]$:

Let us consider: $$ \phi_{t+1} \equiv \beta \Big( c_{t+1}^{-\sigma} \left[\alpha z_{t+1} k_{t+1}^{\alpha-1} + 1-\delta \right] - \mu_{t+1} (1 - \delta) \Big)$$

When the investment constraint does not bind ($\mu_t = 0$), the above FOC equation can be written as:

$$ c_t^{-\sigma} = Et[\phi{t+1}] $$

So if we know $ E_t[\phi_{t+1}]$, we can find the implied consumption today. As in standard PEA, let’s assume a parametric expression for the conditional expectation. Let’s use the following log-log model:

$$ \log(E_t[\phi_{t+1}]) = \theta_0 + \theta_1 \log k_t + \theta_2 \log z_t + \theta_3 \left(\log k_t\right)^2 + \theta_4 \left(\log z_t\right)^2 + \theta_5 \log k_t \cdot \log z_t = \boldsymbol{s}_t’ \boldsymbol{\theta} $$

Implied consumption

Under the assumption that the constraint does not bind ($\mu_t = 0$), consumption in the current period is given by:

$$ \tilde{c}_t = \exp(\boldsymbol{s}_t’ \boldsymbol{\theta})^{-1/\sigma} $$

This consumption choice $\tilde{c}_t$ implies a savings choice: $\tilde{k}_{t+1} = z_t k_t^\alpha - \tilde{c}_t + (1-\delta)k_t$.

Two cases can occur:

  1. If $\tilde{k}_{t+1} \geq (1-\delta)k_t$, then the irreversible investment constraint does not bind:

$$ c_t = \tilde{c}_t $$ $$ k_{t+1} = \tilde{k}_{t+1} $$ $$ \mu_t = 0 $$

  1. If $\tilde{k}_{t+1} < (1-\delta)k_t$, then the irreversible investment constraint binds and we recover $c_t$ from the budget constraint: $$ c_t = z_t k_t^\alpha $$ $$ k_{t+1} = (1-\delta)k_t $$ $$ \mu_t > 0 $$

In case 2, the value of the Lagrange multiplier is then:

$$ \mu_t = c_t^{-\sigma} - \beta E_t \big( c_{t+1}^{-\sigma} \left[\alpha z_{t+1} k_{t+1}^{\alpha-1} + 1-\delta \right] - \mu_{t+1} (1 - \delta) \big) \approx c_t^{-\sigma} - \exp(\boldsymbol{s}_t’ \boldsymbol{\theta}) $$

provided that $\exp(\boldsymbol{s}_t’ \boldsymbol{\theta})$ correctly approximates the conditional expectation.

Below is some code in Python that solves this model using this approach.

Packages and Types

Let’s load the required packages and create a Class for convenience (some elements are useless, a clean-up is needed at some point).

%matplotlib inline
import matplotlib.pyplot as plt

import random
import scipy.stats
import chaospy  ## for quadrature
from itertools import product
import os
import re
import subprocess
import shutil

import time
import pandas as pd
import numpy as np
from math import sqrt
from matplotlib import pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as mcolors
import matplotlib.ticker as mticker

import seaborn as sns; sns.set()
from tqdm import tqdm as tqdm
import datetime
from typing import Tuple
class Vector: pass
from scipy.stats import norm

import torch
from torch import nn
from torch.utils.data import DataLoader, Subset, Dataset, TensorDataset
from torch.nn.utils import clip_grad_norm_

# To create copies of NN
import copy
import matplotlib.ticker as mtick
# To use sparse kronecker product
from scipy import sparse

import itertools
# Interpolations
from scipy.interpolate import LinearNDInterpolator
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
from scipy.interpolate import RegularGridInterpolator

# Regressions
import statsmodels.formula.api as smf
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_breuschpagan
lowess = sm.nonparametric.lowess

import quantecon as qe
from interpolation import interp
from quantecon.optimize import brentq
from numba import njit

import matplotlib.cm as cm
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import MaxNLocator

import platform,socket,re,uuid,json,psutil,logging, cpuinfo, shutil

from scipy.stats import chi2
# Store economic and bc-MC-PEA parameters
class MyParams():
    """
    N: number of draws for innovation vector (per state vector)
    M: number of draws for the state vector
    ...
    """
    def __init__(self, N, M, lr, optimizer, nb_epochs, order_gauss,
                 beta, alpha, gamma, delta, std_tfp, rho_tfp,
                 regression_two_steps, feasible_GLS, effective_sample_size):
        # Economic model parameters
        self.beta = beta # Discount factor (patience)
        self.alpha = alpha # Capital share in production
        self.gamma = gamma # CRRA coefficient
        self.one_min_gamma = 1 - gamma #pre-compute
        # depreciation rate capital
        self.delta = delta
        # Standard deviation exo random variables
        self.std_tfp = std_tfp #0.01
        # Mean value exo random variables
        self.mean_tfp = 1.0
        # Persistence params
        self.rho_tfp = rho_tfp # Persistence log TFP values
        # Non stochastic steady state calculations
        self.kss = ((1 - self.beta * (1 - self.delta)) / (self.alpha * self.beta)) ** (1 / (self.alpha - 1))
        self.std_k = 0 #std. dev capital
        self.css = self.kss**self.alpha - self.delta*self.kss
        self.std_c = 0 #std. dev consumption
        self.zss = 1
        self.std_z = 0
        self.tol_c = 1e-3 #to prevent negative consumption
        # Dependent variable in level or log
        self.formula = ""
        # Options for OLS regression
        self.center_dep_var = False #True #False #demean
        self.normalize_dep_var = False #True #False #divide by std. dev
        self.nb_expl_vars = 10   # including constant
        if self.nb_expl_vars not in (4, 6, 10):
            raise ValueError( f"nb_expl_vars must be one of (4, 6, 10); got {self.nb_expl_vars}"  )
        self.basis = 2          # 1: monomial, 2: Chebyshev
        if self.basis not in (1, 2):
            raise ValueError( f"basis must be either 1 (monomial) or 2 (Chebyshev); got {self.basis}" )
        self.regression_two_steps = regression_two_steps #Use the two-step regression described in appendix (approximation to full GLS)
        self.feasible_GLS = feasible_GLS #
        if self.regression_two_steps and self.feasible_GLS:
            raise RuntimeError("Both regression_two_steps and feasible_GLS are True. Choose one of the options.")
        # Correction for serial correlation in dependent variables
        self.effective_sample_size = effective_sample_size
        self.nb_shocks = 1
        ## State: Distribution of wealth + TFP (no persistence depreciation shocks)
        self.dim_state = 2
        ## Input for neural net
        self.dim_input = 2
        self.dim_output = 1
        # Nb agents:
        self.nb_agents = 1 #One representative household
        # Functions
        ## Utility
        if self.gamma != 1:
            self.u = lambda c: (1/self.one_min_gamma)*(c**(self.one_min_gamma) - 1)
        else:
            self.u = lambda c: torch.log(c)
        self.u_prime =  lambda c: c**(-self.gamma)
        self.u_prime_inv =  lambda c: c**(-(1/self.gamma))
        # bc-MC hyperparameters
        self.N = N #number of iid shocks used for each value of the state vector
        self.M = M #number of iid realization of the state vector
        self.MN = int(M*N)
        # To keep constant the number of function evaluations
        self.T = int((M*N)/2) #number
        self.distribution_shocks = "Normal" #"Normal" #Lognormal
        # Learning parameters
        self.lr = lr
        self.momentum = 0.9 #momentum for SGD with momentum
        self.optimizer = optimizer #"Adam" #default: #Adam or SGD or SWA
        self.freq_gamma = 0.95 #When using a scheduler for the learning rate, lr_{t} = freq_gamma*lr_{t-1}
        self.use_scheduler = False #If true, use a scheduler for the learning rate
        self.nb_epochs = nb_epochs
        self.freq_scheduler = 1000
        # GAUSSIAN QUADRATURE
        ## INNOVATION VECTOR
        strr = "chaospy.Normal(0, self.std_tfp)"
        self.distrib = eval('chaospy.J({})'.format(strr))
        self.order_gauss = order_gauss
        nodes, weights = chaospy.generate_quadrature(self.order_gauss, self.distrib, rule = "gaussian", sparse=True) #dist(self.order_gauss, self.distrib, rule = "gaussian", sp=True)
        self.nodes = nodes
        self.nodes_flat =  self.nodes.flatten() #make 1d array
        self.weights = weights
        self.weights_torch = torch.tensor(weights)
        self.nodes_torch = torch.tensor(np.transpose(self.nodes)) #column=dim. Row=observation
        # Save the number of points for the guassian quadrature:
        self.N_gaussian = len(self.weights_torch)
        # Implied number of points for the current space (T=MN/2 <-> M = 2T/N)
        self.M_gaussian = int((2*self.T)/self.N_gaussian)
        self.MN_gaussian = self.N_gaussian*self.M_gaussian
        # Repeat nodes to match the number of function evaluations for the expectation
        self.nodes_torch_repeated = self.nodes_torch.repeat(self.M_gaussian, 1)


def show_params(params, limited=True):
    """
    Function to display parameter values
    """
    print("learning rate: {}".format(params.lr))
    print("nb epochs: {}".format(params.nb_epochs))
    print("M: {}".format(params.M))
    print("N: {}".format(params.N))
    print("MN: {}".format(params.MN))
    print("T: {}".format(params.T))
    print("optimizer_chosen: {}".format(params.optimizer))
    print("use_scheduler: {}".format(params.use_scheduler))
    print("center_dep_var: {}".format(params.center_dep_var))
    print("normalize_dep_var: {}".format(params.normalize_dep_var))
    print("nb_expl_vars: {}".format(params.nb_expl_vars))
    print("basis: {}".format(params.basis))
    print("regression_two_steps: {}".format(params.regression_two_steps))
    print("feasible_GLS: {}".format(params.feasible_GLS))
    print("effective_sample_size: {}".format(params.effective_sample_size))

# Plotting options
plot_scale = 0.75
plt.rcParams["figure.figsize"] = (plot_scale*16, plot_scale*9)
# Controlling fontsizes
SMALL_SIZE = 12
MEDIUM_SIZE = SMALL_SIZE + 2
BIGGER_SIZE =  SMALL_SIZE + 4
plt.rcParams['legend.fontsize'] = MEDIUM_SIZE
dpi_chosen=600 #control the quality of .png
linewidth_chosen = 2

# Current working directory
current_wd = os.getcwd()
output_extension = "example_1"
output_folder = output_extension + "/"
print(output_folder)

# Create folder if does not exist:
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

example_1/

Let’s initialize an instance of MyParams():

# Economic Parameter. 
beta_chosen = 0.95 #discount factor
alpha_chosen = 0.3 #production params
std_tfp_chosen = 0.14 # Std log TFP. High value for constraint to bind
gamma_chosen = 1.0 #CRRA parameter
rho_chosen = 0.8 #persistence TFP
delta_chosen = 0.1 #depreciation rate

# bc-MC-PEA hyperparamters
M_chosen = 200 #Nb draws state vector
N_chosen = 2 #Nb draws innovation vector for each realization of the state vector
lr_chosen = 1e-5 #1e-4 #4 #1e-4 #3 #5 #1e-3 #default: 1e-4 #3 #Learning rate. Useful when using gradient descent instead of OLS
nb_epochs_chosen = 2000 #
order_gauss_chosen = 5 #number of Gaussian nodes for integration wrt to innovation vector
optimizer_chosen = "Adam" 

# bc-MC-PEA Options
regression_two_steps_chosen = False
feasible_GLS_chosen = False
effective_sample_size_chosen = True

# Extra hyperparamters (not used here)
mu_chosen = 1e-5 
mu_threshold = 1e-10 
mu_decay_chosen = 1.0 
use_Lookahead_chosen = False #Lookahead optimizer

params = MyParams(N_chosen, M_chosen, lr_chosen, optimizer_chosen,
                  nb_epochs_chosen, order_gauss_chosen,
                  beta_chosen, alpha_chosen, gamma_chosen, delta_chosen,
                  std_tfp_chosen, rho_chosen,
                  regression_two_steps_chosen,
                  feasible_GLS_chosen, effective_sample_size_chosen)

show_params(params)
learning rate: 1e-05
nb epochs: 2000
M: 200
N: 2
MN: 400
T: 200
optimizer_chosen: Adam
use_scheduler: False
center_dep_var: False
normalize_dep_var: False
nb_expl_vars: 10
basis: 2
regression_two_steps: False
feasible_GLS: False
effective_sample_size: True

Initialization

We use the following update rule for finding $\theta^{*}$:

$\theta^{(i+1)} = \gamma \theta^{(i)} + (1 - \gamma) \theta^{OLS}$, with $\gamma \in (0,1)$.

We need a starting value $\theta^{(0)}$.

Here, let’s solve the model without the irreversible investment constraint using a 1st order-linearization and use some simulated data to estimate $\theta^{(0)}$. This is easy to do thanks to Dynare and this gives us a reasonable starting point.

Below is a somewhat hacky way to get Dynare within Python:

  • I write the .mod file of the model without the irreversible investment constraint to disk,
  • I call Octave and Dynare to solve and simulate the model at 1st order.

For this to work, you need to have Octave and Dynare installed on your machine (it’s free and open source, see the section “Using Dynare with Octave” here ).

fname = "neogrowth.mod"
dirpath = os.getcwd()  # Get the current working directory
fpath = os.path.join(dirpath, fname)

# Content of the .mod file.
file_content_1 = """
% optimal_growth.mod
% Dynare file for the standard optimal growth model (first-order approximation)

var k c z;
varexo eps;

parameters beta gamma alpha delta rho sigma_eps;

% Parameter values
beta      = {beta};
gamma     = {gamma};      % CRRA coefficient
alpha     = {alpha};
delta     = {delta};
rho       = {rho_tfp};
sigma_eps = {std_tfp};

model;
% Euler equation (after substituting the marginal utility condition)
c^(-gamma) = beta * c(+1)^(-gamma) * ( alpha * z(+1) * k^(alpha-1) + 1 - delta );

% Resource constraint (law of motion for capital)
k = z * k(-1)^alpha - c + (1-delta)*k(-1);

% Technology process (AR(1) in logs)
log(z) = rho*log(z(-1)) + eps;

end;

initval;
% Initial guesses for the steady state
k   = ((alpha*beta)/(1 - beta*(1-delta)))^(1/(1-alpha));
c   = ((alpha*beta)/(1 - beta*(1-delta)))^(alpha/(1-alpha)) - delta*(((alpha*beta)/(1 - beta*(1-delta)))^(alpha/(1-alpha)));
z   = 1;
eps = 0;
end;

steady;
check;

shocks;
var eps = sigma_eps^2;
end;

% IRF
stoch_simul(order=1, irf=100);

% SIMULATED SERIES
stoch_simul(periods=50000);
""".format(beta = params.beta, alpha = params.alpha,  gamma = params.gamma,
           delta = params.delta, std_tfp = params.std_tfp, rho_tfp = params.rho_tfp)

file_content = file_content_1
# Concat stings
# Write the content to the file
with open(fpath, "w") as file:
    file.write(file_content)

# Write to disk an Octave script to solve and simulate the model
file_content_1 = r"""
if exist('OCTAVE_VERSION', 'builtin')

    function T = cell2table(C, varargin)
        if nargin > 1
            opts = varargin{2};
            varname = opts{1};
        else
            varname = 'Var1';
        end
        T = struct();
        for i = 1:size(C,2)
            T.(varname) = C(:,i);
        end
    end

    function T = struct2table(S, varargin)
        T = S;
    end

    function T = array2table(A, varargin)
        if nargin > 1
            opts = varargin{2};
            varnames = opts{1};
        else
            nvars = size(A,2);
            varnames = cell(1, nvars);
            for i = 1:nvars
                varnames{i} = sprintf('Var%d', i);
            end
        end

        T = struct();
        for i = 1:size(A,2)
            if iscell(varnames)
                fieldname = varnames{i};
            else
                fieldname = varnames;
            end
            T.(fieldname) = A(:,i);
        end
    end

    function writetable(T, filename)
        fid = fopen(filename, 'w');
        fields = fieldnames(T);

        % Write header
        for i = 1:length(fields)
            fprintf(fid, '%s', fields{i});
            if i < length(fields)
                fprintf(fid, ',');
            else
                fprintf(fid, '\n');
            end
        end

        % Number of rows
        nRows = length(T.(fields{1}));

        % Write data
        for row = 1:nRows
            for col_idx = 1:length(fields)
                col = T.(fields{col_idx});
                value = col(row,:);
                if iscell(col)
                    fprintf(fid, '%s', col{row});
                elseif ischar(value)
                    fprintf(fid, '%s', strtrim(value));
                elseif isnumeric(value)
                    fprintf(fid, '%g', value);
                else
                    error('writetable: unsupported data type in column %d', col_idx);
                end

                if col_idx < length(fields)
                    fprintf(fid, ',');
                else
                    fprintf(fid, '\n');
                end
            end
        end

        fclose(fid);
    end



end

% Go to working directory
% cd '/content';

% Run Dynare
dynare neogrowth.mod noclearall

% Postprocess
k_pos = strmatch('k', M_.endo_names, 'exact');
c_pos = strmatch('c', M_.endo_names, 'exact');
z_pos = strmatch('z', M_.endo_names, 'exact');

var_positions = [k_pos; c_pos; z_pos];
Sim_series = oo_.endo_simul(var_positions, :)';
var_names = M_.endo_names_long(var_positions, :);

if ~exist('output/Linearization', 'dir')
    mkdir('output/Linearization');
end

% Save steady state values manually
var_list = cellstr(oo_.var_list);  % Variable names
ss_values = oo_.mean;              % Steady state values

fid = fopen('output/Linearization/SS_values.csv', 'w');
fprintf(fid, 'Variable,MeanValue\n');  % Header

for i = 1:length(var_list)
    fprintf(fid, '%s,%g\n', var_list{i}, ss_values(i));
end

fclose(fid);


% Save simulated series
%Sim_table = array2table(Sim_series, 'VariableNames', cellstr(var_names));
%writetable(Sim_table, 'output/Linearization/Sim_series.csv');

% Save Sim_series manually
csvwrite('output/Linearization/Sim_series_only_data.csv', Sim_series);

% Also write header separately
fid = fopen('output/Linearization/Sim_series.csv', 'w');
fprintf(fid, 'k,c,z\n'); % header with variable names

data = Sim_series; % already transposed correctly (observations x variables)

for i = 1:size(data,1)
    fprintf(fid, '%g,%g,%g\n', data(i,1), data(i,2), data(i,3));
end

fclose(fid);

% State-space matrices
A = oo_.dr.ghx(oo_.dr.inv_order_var(M_.state_var'), :);
B = oo_.dr.ghu(oo_.dr.inv_order_var(M_.state_var'), :);
control = (1:size(oo_.dr.ghu,1))';
state = M_.state_var';
for j = 1:length(state)
    control(control == state(j)) = [];
end
C = oo_.dr.ghx(oo_.dr.inv_order_var(control), :);
D = oo_.dr.ghu(oo_.dr.inv_order_var(control), :);

S_variables_names = M_.endo_names(state,:);
X_variables_names = M_.endo_names(control,:);
shocks_names = M_.exo_names;

% Save variable names
writetable(cell2table(cellstr(S_variables_names)), 'output/Linearization/state_variables_names.csv');
writetable(cell2table(cellstr(X_variables_names)), 'output/Linearization/control_variables_names.csv');
writetable(cell2table(cellstr(shocks_names)), 'output/Linearization/exo_variables_names.csv');

% Save matrices A, B, C, D
csvwrite('output/Linearization/A.csv', A);
csvwrite('output/Linearization/B.csv', B);
csvwrite('output/Linearization/C.csv', C);
csvwrite('output/Linearization/D.csv', D);

% Simulations with shocks
len_T = 10;
e1 = zeros(len_T, 1);
e1(1) = 1.0;
horizon = length(e1) + 1;
shocks = zeros(M_.exo_nbr, horizon);
shocks(strcmp(cellstr(shocks_names), 'eps'), 2:horizon) = e1;
Ssim = zeros(length(state), horizon);
Xsim = zeros(length(control), horizon);

for j = 2:horizon
    Ssim(:,j) = A * Ssim(:,j-1) + B * shocks(:,j);
    Xsim(:,j) = C * Ssim(:,j-1) + D * shocks(:,j);
end

csvwrite('output/Linearization/Ssim.csv', Ssim');
csvwrite('output/Linearization/Xsim.csv', Xsim');
"""

fname = "solve_neogrowth_octave.m"
dirpath = os.getcwd()  # Get the current working directory
fpath = os.path.join(dirpath, fname)

file_content = file_content_1
# Concat stings
# Write the content to the file
with open(fpath, "w") as file:
    file.write(file_content)

    
print("Calculating linearized model using Octave Dynare")
# Call Octave & Dynare
!octave solve_neogrowth_octave.m

Calculating linearized model using Octave Dynare
Starting Dynare (version 6.0).
Calling Dynare with arguments: noclearall
Starting preprocessing of the model file ...
Found 3 equation(s).
Evaluating expressions...
Computing static model derivatives (order 1).
Normalizing the static model...
Finding the optimal block decomposition of the static model...
2 block(s) found:
  1 recursive block(s) and 1 simultaneous block(s).
  the largest simultaneous block has 2 equation(s)
                                 and 2 feedback variable(s).
Computing dynamic model derivatives (order 2).
Normalizing the dynamic model...
Finding the optimal block decomposition of the dynamic model...
2 block(s) found:
  1 recursive block(s) and 1 simultaneous block(s).
  the largest simultaneous block has 2 equation(s)
                                 and 2 feedback variable(s).
Preprocessing completed.
Preprocessing time: 0h00m00s.

STEADY-STATE RESULTS:

k        2.62575
c        1.07333
z        1

EIGENVALUES:
         Modulus             Real        Imaginary
             0.8              0.8                0
           0.838            0.838                0
           1.256            1.256                0
       6.646e+17        6.646e+17                0

There are 2 eigenvalue(s) larger than 1 in modulus for 2 forward-looking variable(s)
The rank condition is verified.


MODEL SUMMARY

  Number of variables:         3
  Number of stochastic shocks: 1
  Number of state variables:   2
  Number of jumpers:           2
  Number of static variables:  0


MATRIX OF COVARIANCE OF EXOGENOUS SHOCKS
Variables         eps
eps          0.019600

POLICY AND TRANSITION FUNCTIONS
                                   k               c               z
Constant                    2.625746        1.073331        1.000000
k(-1)                       0.838003        0.214628               0
z(-1)                       0.686992        0.381732        0.800000
eps                         0.858741        0.477165        1.000000


THEORETICAL MOMENTS
VARIABLE         MEAN  STD. DEV.   VARIANCE
k              2.6257     0.8267     0.6834
c              1.0733     0.2591     0.0671
z              1.0000     0.2333     0.0544



MATRIX OF CORRELATIONS
Variables         k       c       z
k            1.0000  0.9876  0.7354
c            0.9876  1.0000  0.8327
z            0.7354  0.8327  1.0000



COEFFICIENTS OF AUTOCORRELATION
Order          1       2       3       4       5
k         0.9806  0.9358  0.8755  0.8067  0.7344
c         0.9626  0.9064  0.8393  0.7671  0.6939
z         0.8000  0.6400  0.5120  0.4096  0.3277

MODEL SUMMARY

  Number of variables:         3
  Number of stochastic shocks: 1
  Number of state variables:   2
  Number of jumpers:           2
  Number of static variables:  0


MATRIX OF COVARIANCE OF EXOGENOUS SHOCKS
Variables         eps
eps          0.019600

POLICY AND TRANSITION FUNCTIONS
                                   k               c               z
Constant                    2.627763        1.071313        1.000000
(correction)                0.002018       -0.002018               0
k(-1)                       0.838003        0.214628               0
z(-1)                       0.686992        0.381732        0.800000
eps                         0.858741        0.477165        1.000000
k(-1),k(-1)                -0.006461       -0.013885               0
z(-1),k(-1)                 0.087580        0.034525               0
z(-1),z(-1)                -0.040418       -0.066454       -0.080000
eps,eps                     0.473559        0.194394        0.500000
k(-1),eps                   0.109475        0.043157               0
z(-1),eps                   0.757695        0.311030        0.800000


MOMENTS OF SIMULATED VARIABLES
VARIABLE              MEAN       STD. DEV.        VARIANCE        SKEWNESS        KURTOSIS
k                 2.803683        0.859478        0.738702        1.011294        2.092793
c                 1.110625        0.255746        0.065406        0.671535        0.805383
z                 1.019190        0.233502        0.054523        0.706720        0.814589


CORRELATION OF SIMULATED VARIABLES
VARIABLE         k       c       z
k           1.0000  0.9824  0.7169
c           0.9824  1.0000  0.8282
z           0.7169  0.8282  1.0000


AUTOCORRELATION OF SIMULATED VARIABLES
VARIABLE         1       2       3       4       5
k           0.9788  0.9302  0.8654  0.7927  0.7174
c           0.9579  0.8965  0.8250  0.7499  0.6752
z           0.7827  0.6114  0.4776  0.3748  0.2973
Total computing time : 0h00m03s

Now, let’s load the simulated series and estimate our first first $\theta^{(0)}$:

#Load simulated data and fit linear model
SS_values = pd.read_csv("output/Linearization/SS_values.csv")
print(SS_values)

Sim_series = pd.read_csv("output/Linearization/Sim_series.csv")
print(Sim_series)

# Load the data
Sim_series = pd.read_csv("output/Linearization/Sim_series.csv")

## Simulate realization conditional expectation
Sim_series['c_plus_1'] = Sim_series['c'].shift(-1)
Sim_series['z_plus_1'] = Sim_series['z'].shift(-1)
Sim_series['k_plus_1'] = Sim_series['k'].shift(-1)
# Conditional expectation
Sim_series['cond_exp'] = params.beta  * (Sim_series['c_plus_1']**(-params.gamma)) * ( params.alpha * Sim_series['z_plus_1'] * Sim_series['k_plus_1']**(params.alpha - 1) + 1 - params.delta )
# Production
Sim_series['cash'] = Sim_series['z'] * Sim_series['k']**params.alpha + (1 - params.delta)*Sim_series['k'] - Sim_series['c']
# Log tfp
Sim_series['a'] = np.log(Sim_series['z'])
params.std_z = np.std(Sim_series['z'])

# Centered vars
Sim_series['c_demeaned'] = Sim_series['c'] - params.css
Sim_series['a_demeaned'] = Sim_series['a'] # mean is 0
Sim_series['z_demeaned'] = Sim_series['z'] - params.zss
Sim_series['k_demeaned'] = Sim_series['k'] - params.kss

# Normalize vars:
params.std_c = np.std(Sim_series['c'])
params.std_a = np.std(Sim_series['a'])
params.std_z = np.std(Sim_series['z'])
params.std_k = np.std(Sim_series['k'])

Sim_series['c_normalized'] = Sim_series['c_demeaned']/params.std_c
Sim_series['a_normalized'] = Sim_series['a_demeaned']/params.std_a
Sim_series['z_normalized'] = Sim_series['z_demeaned']/params.std_z
Sim_series['k_normalized'] = Sim_series['k_demeaned']/params.std_k

fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle('Log variables')
ax1.hist(np.log(Sim_series['c']))
ax1.set_title("Log c")
ax2.hist(Sim_series['a'])
ax2.set_title("a (log z)")
ax3.hist(np.log(Sim_series['k']))
ax3.set_title("Log k")
plt.show()

fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4)
fig.suptitle('Normalized variables')
ax1.hist(Sim_series['c_normalized'])
ax1.set_title("c normalized")
ax2.hist(Sim_series['a_normalized'])
ax2.set_title("a normalized")
ax3.hist(Sim_series['k_normalized'])
ax3.set_title("k normalized")
ax4.hist(Sim_series['z_normalized'])
ax4.set_title("z normalized")
plt.show()

# Create a dataframe:
df_Dynare = pd.DataFrame({'k': Sim_series['k'], 'z': Sim_series['z'], 'a': Sim_series['a'],
                          'k_demeaned': Sim_series['k_demeaned'], 'a_demeaned': Sim_series['a_demeaned'], 'z_demeaned': Sim_series['z_demeaned'],
                          'k_normalized': Sim_series['k_normalized'], 'a_normalized': Sim_series['a_normalized'], 'z_normalized': Sim_series['z_normalized'],
                          'cond_exp':  Sim_series['cond_exp'],
                          'log_cond_exp':  np.log(Sim_series['cond_exp'])})

print(df_Dynare.head())

# If required, transform explanatory variables (demean, normalize)
if (params.center_dep_var == True) & (params.normalize_dep_var == True):
    x1_chosen = "k_normalized"
    x2_chosen = "a_normalized"
elif (params.center_dep_var == True) & (params.normalize_dep_var == False):
    x1_chosen = "k_demeaned"
    x2_chosen = "a_demeaned"
else:
    x1_chosen = "np.log(k)"
    x2_chosen = "a"

# OLS formula to use, which depends on the number of explanatory variables and on the basis function used. 
if params.nb_expl_vars == 4:
    # only main + interaction
    formula_OLS = (
        f"log_cond_exp ~ "
        f"{x1_chosen} + {x2_chosen} + {x1_chosen}*{x2_chosen}"
    )

elif params.nb_expl_vars == 6:
    if params.basis == 1:
        # monomial up to degree 2
        formula_OLS = (
            f"log_cond_exp ~ "
            f"{x1_chosen} + {x2_chosen} + {x1_chosen}*{x2_chosen} + "
            f"I({x1_chosen}**2) + I({x2_chosen}**2)"
        )
    else:
        # Chebyshev up to order 2
        formula_OLS = (
            f"log_cond_exp ~ "
            f"{x1_chosen} + {x2_chosen} + {x1_chosen}*{x2_chosen} + "
            f"I({x1_chosen}**2 - 1) + I({x2_chosen}**2 - 1)"
        )

elif params.nb_expl_vars == 10:
    if params.basis == 1:
        # monomial up to degree 3
        formula_OLS = (
            f"log_cond_exp ~ "
            f"{x1_chosen} + {x2_chosen} + {x1_chosen}*{x2_chosen} + "
            f"I({x1_chosen}**2) + I({x2_chosen}**2) + "
            f"I({x1_chosen}**3) + I({x1_chosen}**2*{x2_chosen}) + "
            f"I({x1_chosen}*{x2_chosen}**2) + I({x2_chosen}**3)"
        )
    else:
        # Chebyshev up to order 3
        formula_OLS = (
            f"log_cond_exp ~ "
            f"{x1_chosen} + {x2_chosen} + {x1_chosen}*{x2_chosen} + "
            f"I({x1_chosen}**2 - 1) + I({x2_chosen}**2 - 1) + "
            f"I(4*{x1_chosen}**3 - 3*{x1_chosen}) + "
            f"I(4*{x2_chosen}**3 - 3*{x2_chosen}) + "
            f"I(({x1_chosen}**2 - 1)*{x2_chosen}) + "
            f"I({x1_chosen}*({x2_chosen}**2 - 1))"
        )

else:
    raise ValueError(
        f"nb_expl_vars must be one of 4,6,10; got {params.nb_expl_vars}"
    )

# estimate model using OLS
model = smf.ols(formula=formula_OLS, data=df_Dynare).fit()

print(model.summary())
plt.hist(model.resid)
plt.show()

coeff_vector = model.params
print(coeff_vector)

coeff_array_0 = model.params.values
print(coeff_array_0)

  Variable  MeanValue
0        k    2.80368
1        c    1.11063
2        z    1.01919
             k         c         z
0      2.49443  0.995207  0.843225
1      2.49373  1.029550  0.971857
2      2.45133  1.005590  0.921800
3      2.48612  1.036100  1.005670
4      2.58609  1.082710  1.089010
...        ...       ...       ...
49995  2.63105  1.044140  0.895067
49996  2.56844  1.036660  0.925521
49997  2.73110  1.141380  1.176080
49998  2.76882  1.122660  1.060340
49999  2.81345  1.137810  1.075140

[50000 rows x 3 columns]

png

png

         k         z         a  k_demeaned  a_demeaned  z_demeaned  \
0  2.49443  0.843225 -0.170521   -0.131316   -0.170521   -0.156775   
1  2.49373  0.971857 -0.028547   -0.132016   -0.028547   -0.028143   
2  2.45133  0.921800 -0.081427   -0.174416   -0.081427   -0.078200   
3  2.48612  1.005670  0.005654   -0.139626    0.005654    0.005670   
4  2.58609  1.089010  0.085269   -0.039656    0.085269    0.089010   

   k_normalized  a_normalized  z_normalized  cond_exp  log_cond_exp  
0     -0.152851     -0.756166     -0.671641  0.972367     -0.028022  
1     -0.153666     -0.126588     -0.120568  0.989717     -0.010337  
2     -0.203019     -0.361083     -0.335017  0.971439     -0.028977  
3     -0.162524      0.025072      0.024291  0.937091     -0.064975  
4     -0.046159      0.378120      0.381329  0.896667     -0.109071  
                            OLS Regression Results                            
==============================================================================
Dep. Variable:           log_cond_exp   R-squared:                       0.959
Model:                            OLS   Adj. R-squared:                  0.959
Method:                 Least Squares   F-statistic:                 1.315e+05
Date:                Wed, 14 Jan 2026   Prob (F-statistic):               0.00
Time:                        11:39:46   Log-Likelihood:                 82307.
No. Observations:               49999   AIC:                        -1.646e+05
Df Residuals:                   49989   BIC:                        -1.645e+05
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=========================================================================================================
                                            coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------------------
Intercept                                 0.3344      0.075      4.482      0.000       0.188       0.481
np.log(k)                                -0.4777      0.071     -6.747      0.000      -0.616      -0.339
a                                        -0.3446      0.076     -4.545      0.000      -0.493      -0.196
np.log(k):a                               0.1172      0.065      1.805      0.071      -0.010       0.244
I(np.log(k) ** 2 - 1)                    -0.1504      0.029     -5.183      0.000      -0.207      -0.094
I(a ** 2 - 1)                             0.0180      0.044      0.406      0.685      -0.069       0.105
I(4 * np.log(k) ** 3 - 3 * np.log(k))     0.0122      0.002      5.189      0.000       0.008       0.017
I(4 * a ** 3 - 3 * a)                     0.0118      0.006      1.999      0.046       0.000       0.023
I((np.log(k) ** 2 - 1) * a)              -0.0273      0.031     -0.873      0.383      -0.089       0.034
I(np.log(k) * (a ** 2 - 1))              -0.0580      0.043     -1.362      0.173      -0.142       0.026
==============================================================================
Omnibus:                      209.185   Durbin-Watson:                   1.991
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              212.146
Skew:                          -0.155   Prob(JB):                     8.57e-47
Kurtosis:                       3.072   Cond. No.                     3.12e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.12e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

png

Intercept                                0.334417
np.log(k)                               -0.477687
a                                       -0.344563
np.log(k):a                              0.117191
I(np.log(k) ** 2 - 1)                   -0.150438
I(a ** 2 - 1)                            0.018025
I(4 * np.log(k) ** 3 - 3 * np.log(k))    0.012195
I(4 * a ** 3 - 3 * a)                    0.011829
I((np.log(k) ** 2 - 1) * a)             -0.027310
I(np.log(k) * (a ** 2 - 1))             -0.058041
dtype: float64
[ 0.3344173  -0.47768687 -0.34456269  0.11719071 -0.15043791  0.01802458
  0.01219452  0.01182905 -0.02730956 -0.05804076]

Estimation of the computational cost model $ T = \alpha_0 + \alpha_{M} M + \alpha_{MN} M N $

In order to use the formulas for $M^{\star}$ and $N^{\star}$, we also need the values for $\alpha_M$ and $\alpha_{MN}$.

One approach is to actually try different values for $(M,N)$, store the computing time, and then use OLS to estimate $\alpha_{M}$ and $\alpha_{MN}$. This is what I do for the paper (see here).

Here, let’s use a shortcut. Let’s pretend we already know the ratio $\frac{\alpha_M}{\alpha_{MN}}$ and that we are using a given value for $M$ (e.g. $M = 1000$). The above formulas, using the serial correlation correction term, tell us that the optimal corresponding choice for $N$ is given by:

$$ N^{\star} = \frac{\alpha_{M}}{\alpha_{MN}} \big(\frac{1 - \rho^2}{1 + \rho^2} \frac{M}{k+1} - 1 \big)$$

Here, I find that $\frac{\alpha_{M}}{\alpha_{MN}} \approx 4.175$. Note that this ratio obviously depends on the model you consider and on the actual software implementation. Here, let’s use a rough estimate for $\frac{\alpha_{M}}{\alpha_{MN}} \approx 4$. If you do not know this ratio and you do not want to estimate it, you may even use rough guesses (e.g. $\frac{\alpha_{M}}{\alpha_{MN}} = 1$).

Below is a function that returns such a $N^{\star}$.

def optimal_N(alpha_ratio: float, rho: float, M: int, k: int) -> int:
    # Correct for serial correlation in state vector
    rho_sq = rho ** 2
    serial_correction = (1 - rho_sq) / (1 + rho_sq)
    
    N_star = alpha_ratio * (serial_correction * ( M / (k + 1) ) - 1)

    # Warning if implied value is negative
    if N_star < 0:
        min_M = (k + 1) * (1 + rho_sq) / (1 - rho_sq)
        print(f"Warning: Implied N* = {N_star:.2f} is negative. "
              f"M = {M} is too small given rho = {rho}. "
              f"Consider using M > {min_M:.0f}.")
    
    return max(1, round(N_star))

Below, let’s use several values for $M$. We then calculate $N^{*}$ based on the above formula.

alpha_ratio = 4.0 

list_N        = []
list_M        = []
list_optimal  = []

list_target_M = list(np.array([500, 1e3, 1e4, 1e5, 1e6, 2*1e6]).astype(int))
list_N_temp = [1]

# Create all comnbinations between lists
df_MN = pd.DataFrame(list(product(list_target_M, list_N_temp)), columns=['M', 'N'])
df_MN["is_optimal"] = 0


# No add theoretical optimum 
for M_vals in list_target_M:
    # Skip large values for M
    if M_vals > 1e4:
        continue
    # 4) compute the theoretical optimum 
    N_opt = optimal_N(alpha_ratio, params.rho_tfp, M_vals, params.nb_expl_vars)
    M_opt = M_vals
    
    print(f"M*: {M_opt}")
    print(f"N*: {N_opt}")

    # 5) append the optimal point
    list_N.append(int(np.maximum(1, N_opt)))
    list_M.append(int(M_opt))
    list_optimal.append(1)

# 6) build DataFrame
df_MN_opt = pd.DataFrame({
    "M":           list_M,
    "N":           list_N,
    "is_optimal":  list_optimal,
})

df_MN = pd.concat([df_MN_opt, df_MN])
df_MN.head(10)
M*: 500
N*: 36
M*: 1000
N*: 76
M*: 10000
N*: 794
M N is_optimal
0 500 36 1
1 1000 76 1
2 10000 794 1
0 500 1 0
1 1000 1 0
2 10000 1 0
3 100000 1 0
4 1000000 1 0
5 2000000 1 0

jit-compiled functions

Below are jit-compiled functions to run the PEA efficiently.

@njit
def fill_X_row(X, i, x1, x2, nb_expl_vars, basis):
    """
    Fill row i of the design matrix X in place, using either
      - basis=1 : monomial basis
      - basis=2 : Chebyshev basis
    up to nb_expl_vars terms (4, 6 or 10).

    After calling this, X[i,:] will be set.
    """
    if nb_expl_vars == 4:
        # simple cross basis
        X[i, :] = np.array([1, x1, x2, x1*x2])
        return

    if basis == 1:
        # —— Monomial basis ——
        if nb_expl_vars == 6:
            # total degree ≤2
            X[i, :] = np.array([
                1,
                x1,
                x2,
                x1 * x2,
                x1**2,
                x2**2
            ])
        elif nb_expl_vars == 10:
            # total degree ≤3
            X[i, :] = np.array([
                1,
                x1,
                x2,
                x1 * x2,
                x1**2,
                x2**2,
                x1**3,
                x1**2 * x2,
                x1 * x2**2,
                x2**3
            ])
        else:
            raise ValueError(
                f"Monomial basis with nb_expl_vars={nb_expl_vars} not supported"
            )

    elif basis == 2:
        # —— Chebyshev basis ——
        if nb_expl_vars == 6:
            # order ≤2 Chebyshev: T_i(x)T_j(y), i+j≤2
            X[i, :] = np.array([
                1,                   # T0(x)T0(y)
                x1,                  # T1(x)T0(y)
                x2,                  # T0(x)T1(y)
                x1 * x2,             # T1(x)T1(y)
                (x1**2 - 1),         # T2(x)T0(y)
                (x2**2 - 1)          # T0(x)T2(y)
            ])
        elif nb_expl_vars == 10:
            # order ≤3 Chebyshev: T_i(x)T_j(y), i+j≤3
            X[i, :] = np.array([
                1,                     # T0(x)T0(y)
                x1,                    # T1(x)T0(y)
                x2,                    # T0(x)T1(y)
                x1 * x2,               # T1(x)T1(y)
                (x1**2 - 1),           # T2(x)T0(y)
                (x2**2 - 1),           # T0(x)T2(y)
                (4*x1**3 - 3*x1),      # T3(x)T0(y)
                (4*x2**3 - 3*x2),      # T0(x)T3(y)
                (x1**2 - 1) * x2,      # T2(x)T1(y)
                x1 * (x2**2 - 1)       # T1(x)T2(y)
            ])
        else:
            raise ValueError(
                f"Chebyshev basis with nb_expl_vars={nb_expl_vars} not supported"
            )

    else:
        raise ValueError(f"Unknown basis={basis}; must be 1 or 2")


@njit
def clamp(arr, low, high):
    """
    Element-wise clamp of `arr` between `low` and `high`.
    Equivalent to np.clip(arr, low, high). np.clip has trouble with jit compilation.
    """
    return np.minimum(np.maximum(arr, low), high)


@njit
def fill_X_next(Xv, x1, x2, nb_expl_vars, basis):
    """
    In‐place fill of Xv (shape = (nb_expl_vars,)) for a single point (x1,x2),
    using monomial basis (basis=1) or Chebyshev basis (basis=2),
    with nb_expl_vars in {4,6,10}.
    """
    # 4‐term cross basis
    if nb_expl_vars == 4:
        Xv[0] = 1.0
        Xv[1] = x1
        Xv[2] = x2
        Xv[3] = x1 * x2
        return

    if basis == 1:
        # —— Monomial ——
        if nb_expl_vars == 6:
            # [1, x1, x2, x1*x2, x1^2, x2^2]
            Xv[0] = 1.0
            Xv[1] = x1
            Xv[2] = x2
            Xv[3] = x1 * x2
            Xv[4] = x1 * x1
            Xv[5] = x2 * x2
            return
        else:
            # nb_expl_vars == 10
            # [1, x1, x2, x1*x2, x1^2, x2^2, x1^3, x1^2*x2, x1*x2^2, x2^3]
            Xv[0] = 1.0
            Xv[1] = x1
            Xv[2] = x2
            Xv[3] = x1 * x2
            Xv[4] = x1 * x1
            Xv[5] = x2 * x2
            Xv[6] = x1 * x1 * x1
            Xv[7] = x1 * x1 * x2
            Xv[8] = x1 * x2 * x2
            Xv[9] = x2 * x2 * x2
            return

    else:
        # —— Chebyshev ——
        if nb_expl_vars == 6:
            # [1, x1, x2, x1*x2, x1^2-1, x2^2-1]
            Xv[0] = 1.0
            Xv[1] = x1
            Xv[2] = x2
            Xv[3] = x1 * x2
            Xv[4] = x1 * x1 - 1.0
            Xv[5] = x2 * x2 - 1.0
            return
        else:
            # nb_expl_vars == 10
            # [1, x1, x2, x1*x2, x1^2-1, x2^2-1,
            #  4x1^3-3x1, 4x2^3-3x2, (x1^2-1)*x2, x1*(x2^2-1)]
            Xv[0] = 1.0
            Xv[1] = x1
            Xv[2] = x2
            Xv[3] = x1 * x2
            Xv[4] = x1 * x1 - 1.0
            Xv[5] = x2 * x2 - 1.0
            Xv[6] = 4.0*x1*x1*x1 - 3.0*x1
            Xv[7] = 4.0*x2*x2*x2 - 3.0*x2
            Xv[8] = (x1*x1 - 1.0) * x2
            Xv[9] = x1 * (x2*x2 - 1.0)
            return


@njit
def simulate_path_N_inplace(
    M: int,
    init: int,
    kss: float,
    e: np.ndarray,
    E: np.ndarray,
    b0: np.ndarray,
    beta: float,
    gamma: float,
    alpha: float,
    delta: float,
    rho_tfp: float,
    N: int,
    nb_expl_vars: int,
    tol_c: float,
    center_dep_var: bool,
    normalize_dep_var: bool,
    basis: int,
    a: np.ndarray,
    k: np.ndarray,
    mu: np.ndarray,
    y: np.ndarray,
    production: np.ndarray,
    inv: np.ndarray,
    cash: np.ndarray,
    c: np.ndarray,
    X: np.ndarray,
    X_next: np.ndarray,
    y_temp: np.ndarray
) -> None:
    """
    Simulates one path of the model using a Parameterized Expectations Algorithm (PEA)
    with in-place mutation of pre-allocated arrays. JIT-compiled with numba for maximum performance.

    All outputs are written directly into the provided arrays; no value is returned.
    """
    slong = M + init

    # To ensure consumption is a least tol_c
    E_max = tol_c**(-gamma)

    # Set value for constant vector
    X[:, 0] = 1.0

    # Initialize state
    k[0] = kss
    a[0] = 0.0
    for i in range(1, slong):
        a[i] = rho_tfp * a[i-1] + e[i]

    for i in range(0, slong):
        # Compute regressors x1, x2
        x1 = np.log(k[i])
        x2 = a[i]

        # Fill in X[i] for OLS regression y = X*b
        fill_X_row(X, i, x1, x2, nb_expl_vars, basis)

        # Cash-in-hand and consumption decision
        production[i] = np.exp(a[i]) * k[i] ** alpha
        cash[i] = production[i] + (1 - delta) * k[i]

        # Consumption, if current constraint on investment does not bind:
        ## Dot product
        scalar = np.dot(X[i, :], b0)

        #E_t_tilde = clamp(np.exp(scalar), production[i]**(-gamma), E_max)
        E_t_tilde = np.exp(scalar)
        c[i] = E_t_tilde ** (-1 / gamma)

        # Update guess, after calculating investment
        inv[i] = production[i] - c[i]
        if inv[i] > 0:
            k[i+1] = cash[i] - c[i]
            mu[i] = 0.0
        else:
            k[i+1] = (1 - delta) * k[i]
            c[i] = production[i]
            mu[i] = c[i]**( - gamma ) - E_t_tilde

        # Only do MC expectation for i >= init. No need for burnin phase
        if i >= init:
            # Prepare next-period terms (precalculation of terms that do not depend on j)
            term1 = alpha * k[i+1]**(alpha - 1.0)
            term2 = k[i+1] ** alpha
            a_tilde = rho_tfp * a[i]

            x1_next = np.log(k[i+1])

            # Monte Carlo expectation
            for j in range(N):
                a_next = a_tilde + E[i, j]
                # production next period:
                production_next = np.exp(a_next) * term2

                x2_next = a_next

                fill_X_next(X_next, x1_next, x2_next, nb_expl_vars, basis)
                scalar_next = np.dot(X_next, b0)

                # Consumption next period, assuming constraint does not bind:
                #E_next_tilde = clamp(np.exp(scalar_next), production_next ** (-gamma), E_max)
                E_next_tilde = np.exp(scalar_next)
                c_next = E_next_tilde ** (-1 / gamma)

                # Update guess, after calculating investment
                inv_next = production_next - c_next
                if inv_next > 0:
                    # non-binding
                    mu_next = 0
                else:
                    # binding
                    c_next = production_next
                    mu_next = c_next**( - gamma ) - E_next_tilde

                ## g2(s,e)
                y_temp[j] =  ( c_next**( -gamma ) ) * ( np.exp(a_next) * term1 + 1.0 - delta ) + mu_next * (1.0 - delta)

            # Final aggregation
            s = 0.0
            for j in range(N):
                s += y_temp[j]

            y[i] = np.log( beta * (s / N) )


@njit
def evaluate_IE_and_EEE_Gauss_path_inplace(
    slong: int,
    kss: float,
    e: np.ndarray,
    b0: np.ndarray,
    beta: float,
    gamma: float,
    alpha: float,
    delta: float,
    rho_tfp: float,
    number_nodes: int,
    quadrature_nodes: np.ndarray,
    quadrature_weights: np.ndarray,
    nb_expl_vars: int,
    tol_c: float,
    center_dep_var: bool,
    normalize_dep_var: bool,
    basis: int,
    # Preallocated arrays for in-place mutation:
    a: np.ndarray,            # shape (slong,)
    k: np.ndarray,            # shape (slong+1,)
    mu: np.ndarray,           # shape (slong+1,)
    production: np.ndarray,   # shape (slong,)
    inv: np.ndarray,          # shape (slong,)
    IE: np.ndarray,           # shape (slong,)
    EEE: np.ndarray,          # shape (slong,)
    cash: np.ndarray,         # shape (slong,)
    c: np.ndarray,            # shape (slong,)
    X: np.ndarray,            # shape (slong, nb_expl_vars)
    X_next: np.ndarray,       # shape (nb_expl_vars,)
    linear_model: np.ndarray, # shape (slong,)
    y_temp1: np.ndarray,      # shape (number_nodes,)
) -> None:

    """
    In-place computation of MSIE (IE) and Euler equation errors (EEE)
    along a single simulated path using Gauss–Hermite quadrature.

    All output arrays (a, k, IE, EEE, cash, c, X, X_next, linear_model,
    y_temp1, y_temp2) must be pre-allocated to the correct shape.
    """
    # Clear values
    a[:] = 0.0
    k[:] = 0.0
    mu[:] = 0.0
    production[:] = 0.0
    inv[:] = 0.0
    IE[:] = 0.0
    EEE[:] = 0.0
    cash[:] = 0.0
    c[:] = 0.0
    X[:,:] = 0.0
    X_next[:] = 0.0
    linear_model[:] = 0.0
    y_temp1[:] = 0.0

    # To ensure consumption is a least tol_c
    E_max = tol_c**(-gamma)

    # Set initial state
    k[0] = kss
    a[0] = 0.0

    # AR(1) TFP path
    for i in range(1, slong):
        a[i] = rho_tfp * a[i - 1] + e[i]



    # Main loop
    for i in range(slong):
        # 1) build current regressors x1,x2
        x1 = np.log(k[i])
        x2 = a[i]

        # 2) fill X[i]
        fill_X_row(X, i, x1, x2, nb_expl_vars, basis)

        # 3) cash, linear prediction, consumption, and k forward
        production[i] = np.exp(a[i]) * k[i] ** alpha
        cash[i] = production[i] + (1 - delta) * k[i]

        # inline dot(X[i], b0)
        scalar = np.dot(X[i, :], b0)

        linear_model[i] = scalar
        #E_t_tilde = clamp(np.exp(linear_model[i]), production[i]**(-gamma), E_max)
        E_t_tilde = np.exp(linear_model[i])
        c[i] = E_t_tilde ** (-1 / gamma)

        # Update guess, after calculating investment
        inv[i] = production[i] - c[i]
        if inv[i] > 0:
            k[i+1] = cash[i] - c[i]
            mu[i] = 0.0
        else:
            k[i+1] = (1 - delta) * k[i]
            c[i] = production[i]
            mu[i] = c[i]**( - gamma ) - E_t_tilde

        # Precompute next‐period quantities
        x1_next = np.log(k[i+1])

        term1 = alpha * (k[i+1] ** (alpha - 1.0))
        term2 = k[i+1] ** alpha
        a_tilde = rho_tfp * a[i]

        # 4) Quadrature loop
        for j in range(number_nodes):
            a_next = a_tilde + quadrature_nodes[j]

            # production next period:
            production_next = np.exp(a_next) * term2

            x2_next = a_next

            # inline dot(X_next, b0)

            # Consumption next period, assuming constraint does not bind:
            fill_X_next(X_next, x1_next, x2_next, nb_expl_vars, basis)
            scalar_next = np.dot(X_next, b0)

            E_next_tilde = np.exp(scalar_next)
            c_next = E_next_tilde ** (-1 / gamma)

            # Update guess, after calculating investment
            inv_next = production_next - c_next
            if inv_next > 0:
                # non-binding
                mu_next = 0
            else:
                # binding
                c_next = production_next
                mu_next = c_next**( - gamma ) - E_next_tilde

            # Monte-Carlo expectation terms:
            ## RHS of Euler equation
            y_temp1[j] = beta * ( ( c_next ** ( - gamma ) ) * ( np.exp(a_next) * term1 + 1.0 - delta ) + mu_next * (1.0 - delta) )

        # 5) Weighted averages & fill IE, EEE
        s1 = 0.0
        for j in range(number_nodes):
            s1 += quadrature_weights[j] * y_temp1[j]

        #Integration error: log(E_t) - x' beta. E_t calculated using Gaussian intergration.
        IE[i]  = np.log(s1) - linear_model[i]
        #Euler equation error: 1 - (1/c_t)*(E_t()^{-1/gamma})
        EEE[i] = 1.0 - (1.0 / c[i]) * ( ( s1  + mu[i] ) **(-1.0 / gamma))



Time & Accuracy tradeoff

In the next block of code, I run standard PEA ($N=1$) and bc-MC-PEA ($N=N^{\star}$) for different choices of $M$. I store total computing time, as well as accuracy measurements. For each choice of $(M, N)$, I do that several times in a row and calculate average values. This is because:

1) PEA and bc-MC-PEA are stochastic methods

2) timing involves measurement errors.

#### ----------------------------------
tol = 1e-8 # tolerance on parameter vector
gam = 1.0  # smoothing parameter between two iterations
max_iter = 50 # max number of iterations
redraw_shocks_every = 1000 #redraw new realizations of innovations (new state and innovation vectors)
init = 100 #burnin (drop first observation for OLS estimation)

slong_test = 50000
init_test = 100 #burnin for test simulation (when measuring accuracy)
nb_tot_reps = 5 #nb of repetitions, to smooth out randomness and potential issues with measuring timing
init = 100 #burnin
# Preallocate arrays for test
a_test            = np.zeros(slong_test)
k_test            = np.zeros(slong_test+1)
mu_test           = np.zeros(slong_test+1)
production_test   = np.zeros(slong_test)
inv_test          = np.zeros(slong_test)
IE           = np.zeros(slong_test)
EEE          = np.zeros(slong_test)
cash_test         = np.zeros(slong_test)
c_test            = np.zeros(slong_test)
X_test            = np.zeros((slong_test, params.nb_expl_vars))
X_next_test       = np.zeros(params.nb_expl_vars)
linear_model_test = np.zeros(slong_test)
y_temp1_test      = np.zeros(len(params.nodes_flat))

# to store restults
results = []
np.random.seed(42)

# Loop over (M,N)
for idx, row in df_MN.iterrows():
    M = row['M']
    N = row['N']
    is_optimal = row['is_optimal']
    # Repeat experiment several times
    for nb_rep in range(nb_tot_reps):
        #np.random.seed(nb_rep)
        # Innovations for the out-sample test
        e_test = params.std_tfp * np.random.randn(slong_test) #New shocks

        slong = init + M

        b0_current = coeff_array_0.copy()  # initial guess 

        # innovation for state vector (not used directly in simulation here
        # for M large, no need to redraw many times.
        e = params.std_tfp * np.random.randn(slong)
        # extra draws for each state: shape (slong, N)
        E = params.std_tfp * np.random.randn(slong, N)

        # Preallocate arrays
        a = np.zeros(slong)
        k      = np.zeros(slong + 1)    # capital path (slong+1 because we update k[i+1])
        mu     = np.zeros(slong + 1)    # Lagrange multiplier on investment constraint
        y_out      = np.zeros(slong)    # simulated y
        production = np.zeros(slong)    # production
        inv = np.zeros(slong)           # investment
        cash     = np.zeros(slong)      # cash in hand
        c = np.zeros(slong)             # consumption
        X_data      = np.zeros((slong, params.nb_expl_vars))   # regressor matrix (6 variables)
        X_next = np.zeros(params.nb_expl_vars) # regressor, next period
        y_temp = np.zeros(N)            # temporary array for innovations

        # Warmup (compilation) first go
        simulate_path_N_inplace(M, init, params.kss, e, E, b0_current,
                                params.beta, params.gamma, params.alpha, params.delta, params.rho_tfp, N,
                                params.nb_expl_vars, params.tol_c, params.center_dep_var, params.normalize_dep_var, params.basis,
                                a, k, mu, y_out, production, inv, cash, c, X_data, X_next, y_temp)

        iter_num = 1
        crit = 1.0
        # Run a fixed number of iterations (or use while crit > tol)
        start_time = time.perf_counter()
        while iter_num < max_iter:
            if iter_num % redraw_shocks_every == 0:
                e[:], E[:,:] = generate_random_arrays(slong, N, params.std_tfp)

            # Simulation:
            simulate_path_N_inplace(M, init, params.kss, e, E, b0_current,
                                params.beta, params.gamma, params.alpha, params.delta, params.rho_tfp, N,
                                params.nb_expl_vars, params.tol_c, params.center_dep_var, params.normalize_dep_var, params.basis,
                                a, k, mu, y_out, production, inv, cash, c, X_data, X_next, y_temp)

            # Remove burnin and last period
            X_reg = X_data[init:-1, :]
            y_reg = y_out[init:-1]
            # OLS
            bt, _, _, _ = np.linalg.lstsq(X_reg, y_reg, rcond=None)
            # Parameter update
            b_new = gam * bt + (1 - gam) * b0_current
            crit = np.max(np.abs(b_new - b0_current))
            b0_current = b_new.copy()
            #print("Iteration:", iter_num, "Conv. crit.:", crit)
            iter_num += 1
        end_time = time.perf_counter()
        elapsed = end_time - start_time

        # print(f"Iter {nb_rep}. M = {M}, N = {N}, elapsed time: {elapsed} seconds")
        # Compute residuals and In-sample MSE.
        Res = y_reg - np.dot(X_reg, b0_current)
        MSE = np.mean(Res ** 2)

        # Mean squared integration error and EEE
        evaluate_IE_and_EEE_Gauss_path_inplace(slong_test, params.kss,
                                         e_test, b0_current, params.beta, params.gamma, params.alpha,
                                         params.delta, params.rho_tfp, len(params.nodes_flat),
                                         params.nodes_flat, params.weights, params.nb_expl_vars,
                                         params.tol_c, params.center_dep_var, params.normalize_dep_var, params.basis,
                                         a_test, k_test, mu_test, production_test, inv_test,
                                         IE, EEE, cash_test, c_test, X_test, X_next_test, linear_model_test,
                                         y_temp1_test)


        MSIE = np.mean(IE[init_test:-1]**2)
        A_EEE = np.mean(np.abs(EEE[init_test:-1]))

        # Store the results in a dictionary
        results.append({
            "repetition": nb_rep,
            "k": params.nb_expl_vars,
            "M": M,
            "N": N,
            "Time": elapsed,
            "MSE": MSE,
            "MSIE": MSIE,
            "A_EEE": A_EEE,
            "is_optimal": is_optimal
        })
        
    print("Final iteration M:", M, "Iterations:", iter_num, "OLS MSE:", MSE, "MISE:", MSIE, "Average EEE:", A_EEE)
    print("Final b0:", b0_current)

# Create a Pandas DataFrame from the results
df_results_2 = pd.DataFrame(results)
df_results_2.to_csv(output_folder + "df_results_2.csv")
print(df_results_2.head())
Final iteration M: 500 Iterations: 50 OLS MSE: 6.39689620736686e-05 MISE: 6.979887652063562e-06 Average EEE: 0.0015655538850610894
Final b0: [ 0.16247757 -0.24985331 -0.76061818  0.25599638 -0.07506965 -0.16968142
  0.00688403 -0.02032964 -0.12008197  0.17564347]
Final iteration M: 1000 Iterations: 50 OLS MSE: 3.796981261039479e-05 MISE: 3.19778695810395e-06 Average EEE: 0.001112133816018006
Final b0: [-0.09056384 -0.01856654 -1.01221979  0.43950113 -0.13488526 -0.35173479
  0.01100738 -0.04441094 -0.19709003  0.33839319]
Final iteration M: 10000 Iterations: 50 OLS MSE: 6.542653166420021e-06 MISE: 2.9566529603752238e-06 Average EEE: 0.001055960821718564
Final b0: [-0.04979799 -0.05504875 -0.97966892  0.40987097 -0.11831992 -0.32878382
  0.00979378 -0.04387424 -0.18413251  0.31862007]
Final iteration M: 500 Iterations: 50 OLS MSE: 0.0022142019493280383 MISE: 0.00012135434094234673 Average EEE: 0.00668642546122861
Final b0: [ 0.82229268 -0.90739086 -0.53642172 -0.18030242  0.33838932 -0.09953621
 -0.02229608 -0.10234907  0.09603063  0.09430634]
Final iteration M: 1000 Iterations: 50 OLS MSE: 0.0025993825788082916 MISE: 4.058952136372577e-05 Average EEE: 0.003625722738373235
Final b0: [ 0.02385481 -0.16811925 -0.63398239  0.03385158 -0.07134272 -0.34193687
  0.0082259  -0.04745522 -0.02632164  0.29541885]
Final iteration M: 10000 Iterations: 50 OLS MSE: 0.0024508247787179416 MISE: 1.1136421709113198e-05 Average EEE: 0.0016576982242947937
Final b0: [ 0.2617814  -0.36112256 -0.66835186  0.16369622 -0.03247485 -0.11747232
  0.00252398 -0.01751672 -0.05608633  0.1099069 ]
Final iteration M: 100000 Iterations: 50 OLS MSE: 0.002393585822859449 MISE: 5.4760625776998754e-06 Average EEE: 0.0013227092630956365
Final b0: [-0.01766254 -0.10006983 -0.93743997  0.39022456 -0.11801585 -0.29676254
  0.00994274 -0.03508161 -0.16897343  0.27758845]
Final iteration M: 1000000 Iterations: 50 OLS MSE: 0.00238513101035928 MISE: 4.479321593683236e-06 Average EEE: 0.001291546782524435
Final b0: [-0.03491625 -0.07837509 -0.95125539  0.39424473 -0.11779865 -0.30922358
  0.00942587 -0.03916162 -0.17295513  0.29344339]
Final iteration M: 2000000 Iterations: 50 OLS MSE: 0.002384620757014956 MISE: 5.775545032534041e-06 Average EEE: 0.0013630658243780791
Final b0: [-0.02917404 -0.08457067 -0.9441379   0.39027564 -0.11913959 -0.30199118
  0.00948944 -0.03804026 -0.17036995  0.28607765]
   repetition   k    M   N      Time       MSE      MSIE     A_EEE  is_optimal
0           0  10  500  36  0.050169  0.000073  0.000015  0.002113           1
1           1  10  500  36  0.047353  0.000073  0.000005  0.001379           1
2           2  10  500  36  0.046689  0.000075  0.000009  0.001008           1
3           3  10  500  36  0.044587  0.000071  0.000009  0.001108           1
4           4  10  500  36  0.045598  0.000064  0.000007  0.001566           1

As illustrated below, the bc-MC-PEA outfperforms the standard PEA, even when using a rough estimate for the ratio $\frac{\alpha_{M}}{\alpha_{MN}}$.

def compute_pareto_front(df, time_col='Time', mse_col='MSIE'):
    """
    Compute Paretor frontier
    """
    # Sort by Time ascending
    df_sorted = df.sort_values(by=time_col, ascending=True)

    # List to store the Pareto front
    pareto_points = []

    # Keep track of the lowest MSE encountered so far
    best_mse_so_far = float('inf')

    for idx, row in df_sorted.iterrows():
        mse_val = row[mse_col]
        if mse_val < best_mse_so_far:
            pareto_points.append(row)
            best_mse_so_far = mse_val

    return pd.DataFrame(pareto_points)

df_results_2["log_N"]    = np.log(df_results_2["N"])
df_results_2["log_M"]    = np.log(df_results_2["M"])
df_results_2["log_MN"]   = np.log(df_results_2["M"] * df_results_2["N"])

df_results_2["MN_label"] = df_results_2["M"].astype("str") + "-" + df_results_2["N"].astype("str")
df_results_average_2 = df_results_2.groupby("MN_label").mean().reset_index()
# Save to disk
df_results_average_2.to_csv(output_folder + "df_results_average_2.csv")

# Focus on standard PEA (N = 1).
df_one = df_results_average_2[df_results_average_2['N'] == 1]
df_one = df_one.sort_values('Time')

# Now highlight the optimal points
optimal_df = df_results_average_2[df_results_average_2['is_optimal'] == 1]

for (col, long_name) in zip(['MSIE', 'A_EEE'], ['MSIE', 'Average EEE']):
    # Compute Pareto frontier
    pareto_df = compute_pareto_front(df_results_average_2, mse_col=col)
    # Log or linear plots
    for nb in [0]:
        plt.figure(figsize=(8,6))
        sns.scatterplot(data=df_results_average_2, x='Time', y=col, alpha=0.7)
        plt.plot(pareto_df['Time'], pareto_df[col], 'r-o', label='Pareto Frontier')
         
        plt.scatter(optimal_df['Time'], optimal_df[col],
                    marker='*', s=250, c='red',
                    edgecolors='black', linewidths=1,
                    label='Optimal (M, N)')
        
        plt.plot(df_one['Time'], df_one[col], 'b', label='Frontier N = 1')
        plt.scatter(df_one['Time'], df_one[col],
                    marker='+', s=250, c='b', linewidths=1,
                    label='N = 1')
        
        plt.xlabel('Time (seconds)')
        plt.ylabel(long_name)
        if nb == 0:
            plt.yscale('log')
            plt.xscale('log')
        plt.title(f'Time vs. {long_name}, Colored by N')
        plt.legend(loc= "upper right")
        plt.grid(True)
        plt.savefig(output_folder + f"Time_vs_MSIE_{nb}.pdf", dpi=300)
        plt.show()

image

image

III. Conclusion

To reiterate what was said in the introduction, this paper can be read from two perspectives. On the practical side, the bc-MC-PEA can be seen as a practical recipe to get more accurate numerical solutions to economic models (at a constant computational budget) if one is already using the PEA. The method itself does not require massive coding skills and is easy to implement if one is willing to use a rough guess for the ratio $\frac{\alpha_{M}}{\alpha_{MN}}$.

On the theoretical side, I find it interesting that the PEA and its generalization can be seen as originating from a more general optimal estimator (the bc-MC operator). In my view, this creates a bridge between modern computational economics, which often uses complex neural networks and backpropagation to train them, and classic computational economics, for instance PEA + OLS.

References

To cite this work:

@article{PASCAL2026112790,
title = {A generalization of the Parameterized Expectations Algorithm},
journal = {Economics Letters},
volume = {259},
pages = {112790},
year = {2026},
issn = {0165-1765},
doi = {https://doi.org/10.1016/j.econlet.2025.112790},
url = {https://www.sciencedirect.com/science/article/pii/S0165176525006275},
author = {Julien Pascal}
}

Related