The Simulated Method of Moments: a Parallel Implementation

In my previous post, I discussed how the the simulated method of moments can be used to estimate parameters without using the likelihood function. This method is useful because many “real-life” applications result in untractable likelihood functions. In this post, I use the same toy example (estimation of the mean of a mutlivariate normal random variable) and show how to use the parallel computing capabilities of Julia and MomentOpt to speed-up the estimation.

Adding workers

In this example, the goal is to estimate the mean of 4-dimension normal random variable with unit variance, without using any information on the likelihood. If you start Julia with several processors, MomentOpt will notice it and execute the code in parallel. The first step is to add “workers” to Julia. A rule of thumb is to use as many workers as you have processors on your system (4 in my case).

# Current number of workers
#--------------------------
currentWorkers = nprocs()
println("Initial number of workers = $(currentWorkers)")
# I want to have 4 workers running
#--------------------------------
maxNumberWorkers = 4
while nprocs() < maxNumberWorkers
    addprocs(1)
end
# check the number of workers:
#----------------------------
currentWorkers = nprocs()
println("Number of workers = $(currentWorkers)")
Initial number of workers = 1
Number of workers = 4

@everywhere

When running Julia with several workers, you have to add the macro @everywhere when loading packages and defining functions. More details on parallel computing with Julia can be found here.

#---------------------------------------------------------------------------------------------------------
# Julien Pascal
# https://julienpascal.github.io/
# last edit: 06/06/2018
#
# Julia script that shows how the simulated method of moments can be used in
# a simple setting: estimation of the mean of a Normal r.v.
# This version was built to be executed with several processors
# For instance, start julia with: julia -p 4
#
# I use the package MomentOpt: https://github.com/floswald/MomentOpt.jl
#
# Code heavily based on the file https://github.com/floswald/MomentOpt.jl/blob/master/src/mopt/Examples.jl
#----------------------------------------------------------------------------------------------------------

@everywhere using MomentOpt
@everywhere using GLM
@everywhere using DataStructures
@everywhere using DataFrames
@everywhere using Plots
#plotlyjs()
@everywhere pyplot()

#------------------------------------------------
# Options
#-------------------------------------------------
# Boolean: do you want to save the plots to disk?
savePlots = true

#------------------------
# initialize the problem:
#------------------------
# Specify the initial values for the parameters, and their support:
#------------------------------------------------------------------
pb = OrderedDict("p1" => [0.2,-3,3] , "p2" => [-0.2,-2,2], "p3" => [0.1,0,10], "p4" => [-0.1,-10,0])
# Specify moments to be matched + subjective weights:
#----------------------------------------------------
trueValues = OrderedDict("mu1" => [-1.0] , "mu2" => [1.0], "mu3" => [5.0], "mu4" => [-4.0])
moms = DataFrame(name=["mu1","mu2","mu3", "mu4"],value=[-1.0,1.0, 5.0, -4.0], weight=ones(4))




# objfunc_normal(ev::Eval)
#
# GMM objective function to be minized.
# It returns a weigthed distance between empirical and simulated moments
#
@everywhere function objfunc_normal(ev::Eval; verbose = false)

    start(ev)


    # when running in parallel, display worker's id:
    #-----------------------------------------------
    if verbose == true
        if nprocs() > 1
          println(myid())
        end
    end

    # extract parameters from ev:
    #----------------------------
    mu  = collect(values(ev.params))

    # compute simulated moments
    #--------------------------
    # Monte-Carlo:
    #-------------
    ns = 10000 #number of i.i.d draws from N([mu], sigma)
    #initialize a multivariate normal N([mu], sigma)
    #mu is a four dimensional object
    #sigma is set to be the identity matrix
    sigma = [1.0 ;1.0; 1.0; 1.0]
    # draw ns observations from N([mu], sigma):
    randMultiNormal = MomentOpt.MvNormal(mu,MomentOpt.PDiagMat(sigma))
    # calculate the mean of the simulated data
    simM            = mean(rand(randMultiNormal,ns),2)
    # store simulated moments in a dictionary
    simMoments = Dict(:mu1 => simM[1], :mu2 => simM[2], :mu3 => simM[3], :mu4 => simM[4])

    # Calculate the weighted distance between empirical moments
    # and simulated ones:
    #-----------------------------------------------------------
    v = Dict{Symbol,Float64}()
    for (k, mom) in dataMomentd(ev)
        # If weight for moment k exists:
        #-------------------------------
        if haskey(MomentOpt.dataMomentWd(ev), k)
            # divide by weight associated to moment k:
            #----------------------------------------
            v[k] = ((simMoments[k] .- mom) ./ MomentOpt.dataMomentW(ev,k)) .^2
        else
            v[k] = ((simMoments[k] .- mom) ) .^2
        end
    end

    # Set value of the objective function:
    #------------------------------------
    setValue(ev, mean(collect(values(v))))

    # also return the moments
    #-----------------------
    setMoment(ev, simMoments)

    # flag for success:
    #-------------------
    ev.status = 1

    # finish and return
    finish(ev)

    return ev
end



# Initialize an empty MProb() object:
#------------------------------------
mprob = MProb()

# Add structural parameters to MProb():
# specify starting values and support
#--------------------------------------
addSampledParam!(mprob,pb)

# Add moments to be matched to MProb():
#--------------------------------------
addMoment!(mprob,moms)

# Attach an objective function to MProb():
#----------------------------------------
addEvalFunc!(mprob, objfunc_normal)


# estimation options:
#--------------------
# number of iterations for each chain
niter = 1000
# number of chains
# nchains = nprocs()
nchains = 4

opts = Dict("N"=>nchains,
        "maxiter"=>niter,
        "maxtemp"=> 5,
        "coverage"=>0.025,
        "sigma_update_steps"=>10,
        "sigma_adjust_by"=>0.01,
        "smpl_iters"=>1000,
        "parallel"=>true,
        "maxdists"=>[0.05 for i in 1:nchains],
        "mixprob"=>0.3,
        "acc_tuner"=>12.0,
        "animate"=>false)



#---------------------------------------
# Let's set-up and run the optimization
#---------------------------------------
# set-up BGP algorithm:
MA = MAlgoBGP(mprob,opts)

# run the estimation:
@time MomentOpt.runMOpt!(MA)

# show a summary of the optimization:
@show MomentOpt.summary(MA)

Inference

When using the BGP algorithm, inference can be done using the first chain only. Other chains are used to explore the state space and help to exit potential local minima, but they are not meant to be used for inference. I discard the first 10th observations to get rid of the influence of the starting values. Visual inspection of the first chain suggests that the stationary part of the Markov chain was reached at this stage. I then report the quasi posterior mean and median for each parameter. As reported below, we are quite close to the true values.

# Plot histograms for the first chain, the one with which inference should be done.
# Other chains are used to explore the space and avoid local minima
#-------------------------------------------------------------------------------
p1 = histogram(MA.chains[1])
display(p1)

if savePlots == true
    savefig(p1, joinpath(pwd(),"histogram_chain1.svg"))
end

# Plot the realization of the first chain
#----------------------------------------
p2 = plot(MA.chains[1])
if savePlots == true
    savefig(p2, joinpath(pwd(),"history_chain_1.svg"))
end
display(p2)


# Realization of the first chain:
#-------------------------------
dat_chain1 = MomentOpt.history(MA.chains[1])

# discard the first 10th of the observations ("burn-in" phase):
#--------------------------------------------------------------
dat_chain1[round(Int, niter/10):niter, :]

# keep only accepted draws:
#--------------------------
dat_chain1 = dat_chain1[dat_chain1[:accepted ].== true, : ]

# create a list with the parameters to be estimated
parameters = [Symbol(String("mu$(i)")) for i=1:4]
# list with the corresponding priors:
#------------------------------------
estimatedParameters = [Symbol(String("p$(i)")) for i=1:4]


# Quasi Posterior mean and quasi posterior median for each parameter:
#-------------------------------------------------------------------
for (estimatedParameter, param) in zip(estimatedParameters, parameters)

  println("Quasi posterior mean for $(String(estimatedParameter)) = $(mean(dat_chain1[estimatedParameter]))")
  println("Quasi posterior median for $(String(estimatedParameter)) = $(median(dat_chain1[estimatedParameter]))")
  println("True value = $(trueValues[String(param)][])")

end

# Output:
#--------
# Quasi posterior mean for p1 = -0.9160461484604642
# Quasi posterior median for p1 = -0.9589739759449558
# True value = -1.0
# Quasi posterior mean for p2 = 0.9888798123473025
# Quasi posterior median for p2 = 1.0675028518780796
# True value = 1.0
# Quasi posterior mean for p3 = 4.922658319685393
# Quasi posterior median for p3 = 4.989662707150382
# True value = 5.0
# Quasi posterior mean for p4 = -3.898597557236236
# Quasi posterior median for p4 = -3.968649064061086
# True value = -4.0

Figure 1

History of chain 1
History of chain 1

Figure 2

Histograms for chain 1
Histograms for chain 1

Safety checks

In this toy example, we know that the conditions for global identification are met. However, in more complicated applications, global identification may be hard to prove analytically. A common practice is to make sure that the objective function is “well-behaved” in a neighborhood of the solution using slices. The graph below shows that there is no flat region in the neighborhood of the solution, suggesting (at least) local identification of the parameters.

# plot slices of objective function
# grid with 20 points
#-----------------------------------
s = doSlices(mprob,20)

# plot slices of the objective function:
#---------------------------------------
p = MomentOpt.plot(s,:value)
display(p)

if savePlots == true
    Plots.savefig(p, joinpath(pwd(),"slices_Normal.svg"))
end



# Produce more precise plots with respect to each parameter:
#-----------------------------------------------------------
for symbol in parameters

  p = MomentOpt.plot(s,symbol)
  display(p)

  if savePlots == true
      Plots.savefig(p, joinpath(pwd(),"slices_Normal_$(String(symbol)).svg"))
  end


end

Figure 3

Slices of the objective function
Slices of the objective function

Parallel versus Serial

Here is benchmark of running the code above in serial versus in parallel (starting julia with 4 workers):

  • Serial: 639.661831 seconds (12.52 M allocations: 1.972 GiB, 97.50% gc time)
  • Parallel: 372.454707 seconds (279.32 M allocations: 14.843 GiB, 2.19% gc time)

Computing time is approximately divided by 2 when executing the parallel version.

Notebook

A jupyter notebook containing the code in this post (with some slight modifications) can be downloaded here.