{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    }
   ],
   "source": [
    "%pylab inline\n",
    "%config InlineBackend.figure_format = 'retina'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import astropy.cosmology as cosmo\n",
    "from astropy.cosmology import Planck15\n",
    "import astropy.units as u\n",
    "import h5py\n",
    "import seaborn as sns\n",
    "\n",
    "sns.set_context('notebook')\n",
    "sns.set_palette('colorblind')\n",
    "sns.set_style('ticks')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preliminaries / Definitions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some useful references are [Loredo (2004)](https://ui.adsabs.harvard.edu/abs/2004AIPC..735..195L/abstract); [Mandel, Farr, & Gair (2018)](https://ui.adsabs.harvard.edu/abs/2019MNRAS.486.1086M/abstract); [Tiwari (2018)](https://ui.adsabs.harvard.edu/abs/2018CQGra..35n5009T/abstract); and [Farr (2019)](https://ui.adsabs.harvard.edu/abs/2019RNAAS...3...66F/abstract).  The development here mostly follows Farr (2019).\n",
    "\n",
    "We assume that the events whose population we are trying to model occur as a Poisson point process.  Each event has some (latent---that is not perfectly known / observed) parameters $\\theta$, and events occur with an intensity (or rate) in parameter space given by\n",
    "$$\n",
    "\\frac{\\mathrm{d} N}{\\mathrm{d} \\theta}(\\lambda),\n",
    "$$\n",
    "where $\\lambda$ are parameters that determine the intensity.  It is these \"population-level\" parameters that we are trying to determine by analyzing our catalog.\n",
    "\n",
    "We do not have access to the parameters of an event directly, but only through some observational data, $d$, which include some stochastic noise such that we can only recover the parameters probabilistically through a likelihood function that describes the distribution of data at fixed parameters:\n",
    "$$\n",
    "p\\left( d \\mid \\theta \\right).\n",
    "$$\n",
    "\n",
    "Further, we assume that a catalog of events is produced by a process that *censors* some of the data---that is, there is some \"pipeline\" that examines the data and decides whether it should be included in the catalog.  We assume this process is a *deterministic* function of the data (not the parameters $\\theta$---which are unknown), so there is some function $f$ and a threshold value $f_\\mathrm{th}$ such that the catalog is composed of exactly those data where $f(d) > f_\\mathrm{th}$.  ([Mandel, Farr, & Gair (2018)](https://ui.adsabs.harvard.edu/abs/2019MNRAS.486.1086M/abstract) discuss modeling in the case where some of these assumptions are relaxed, particularly when the censoring process is probabilistic rather than deterministic, or when the censoring depends on both data and parameters; an example of this latter situation was discussed in [Farr & Mandel (2018)](https://ui.adsabs.harvard.edu/abs/2018Sci...361.6506F/abstract) where physical processes enforced that some combinations of parameter values will not enter the catalog.)\n",
    "\n",
    "Under these assumptions, the appropriate likelihood function for a catalog of events with index $i = 1, \\ldots, N$ is ([Loredo (2004)](https://ui.adsabs.harvard.edu/abs/2004AIPC..735..195L/abstract); [Mandel, Farr, & Gair (2018)](https://ui.adsabs.harvard.edu/abs/2019MNRAS.486.1086M/abstract))\n",
    "$$\n",
    "p\\left( \\left\\{ d_i, \\theta_i \\mid i = 1, \\ldots, N \\right\\} \\mid \\lambda \\right) = \\exp\\left( -\\Lambda(\\lambda) \\right) \\prod_{i=1}^N p\\left( d_i \\mid \\theta_i \\right) \\frac{\\mathrm{d} N}{\\mathrm{d} \\theta}(\\lambda),\n",
    "$$\n",
    "where\n",
    "$$\n",
    "\\Lambda(\\lambda) = \\int_{\\left\\{ d \\mid f(d) > f_\\mathrm{th} \\right\\}} \\mathrm{d} d \\, \\mathrm{d} \\theta \\, p\\left( d \\mid \\theta \\right) \\frac{\\mathrm{d} N}{\\mathrm{d} \\theta}(\\lambda),\n",
    "$$\n",
    "is the expected number of objects in the catalog when the population is described by the population parameters $\\lambda$ (the actual number of objects in the catalog when the population parameters are $\\lambda$ will be Poisson distributed with mean $\\Lambda$).  Note that $\\Lambda$ is a double integral over both *data* and *parameters* drawn respectively from the likelihood function and the population *subject to the constrant that the data would have passed the censoring threshold for the catalog*.  The selection function enters a population analysis entirely through $\\Lambda$.\n",
    "\n",
    "LIGO/Virgo/Kagra release a data product that permits importance-weighted Monte-Carlo estimation of $\\Lambda$.  The data product is a suite of random binary black hole waveforms that have been injected into the LIGO/Virgo/Kagra data stream and detected by the same LIGO/Virgo/Kagra pipelines that have been used to produce the catalog.  LVK have drawn BBH merger params from a fiducial distribution, $p_\\mathrm{draw}\\left( \\theta \\right)$ (more details below), used these to generate data from the *empirical* likelihood function, $p\\left( d \\mid \\theta \\right)$, filtered through the *actual* pipeline $f(d)$, and recorded the pipeline output statistic for each injection (labelled \"IFAR\" for Inverse False Alarm Rate; you can set your own \"effective\" threshold on this number, or use the LVK threshold of $\\mathrm{IFAR} > 1 \\, \\mathrm{yr}$ from its own analysis, [Abbott, et al. (2020)](https://dcc.ligo.org/LIGO-P2000077/public)) which is directly comparable to the same statistic reported for the events in the O3a catalog [(Abbott, et al. 2020)](https://dcc.ligo.org/LIGO-P2000061/public).  LVK also report the *properly normalized* value of $p_\\mathrm{draw}\\left( \\theta \\right)$ for each of these injections (again, see below).\n",
    "\n",
    "Integrals of the form\n",
    "$$\n",
    "I[g] = \\int_{f(d) > f_\\mathrm{th}} \\mathrm{d} d \\, \\mathrm{d} \\theta \\, p\\left( d \\mid \\theta \\right) p_\\mathrm{draw} \\left( \\theta \\right) g\\left( \\theta \\right)\n",
    "$$\n",
    "can be estimated via the Monte-Carlo method using these LVK data products because\n",
    "$$\n",
    "I[g] \\simeq \\frac{1}{N_\\mathrm{draw}} \\sum_{f_i > f_\\mathrm{th}} g\\left( \\theta_i \\right),\n",
    "$$\n",
    "where $N_\\mathrm{draw}$ is the total number of draws LVK have made from their fiducial distribution $p_\\mathrm{draw}\\left( \\theta \\right)$, and $f_i$ and $\\theta_i$ are the IFAR and parameters for the $i$th draw.  The sum is taken over all synthetic injections that are above the threshold $f_\\mathrm{th}$.  \n",
    "\n",
    "$\\Lambda(\\lambda)$ is just such an integral, with \n",
    "$$\n",
    "g_\\Lambda\\left(\\theta\\right) \\equiv \\frac{1}{p_\\mathrm{draw}\\left( \\theta \\right)} \\frac{\\mathrm{d} N}{\\mathrm{d} \\theta},\n",
    "$$\n",
    "so that \n",
    "$$\n",
    "\\Lambda\\left( \\lambda \\right) \\simeq \\frac{1}{N_\\mathrm{draw}} \\sum_{f_i > f_\\mathrm{th}} \\frac{1}{p_\\mathrm{draw}\\left( \\theta_i \\right)} \\frac{\\mathrm{d} N}{\\mathrm{d} \\theta_i}\\left(\\lambda \\right).\n",
    "$$\n",
    "The estimate is effectively an importance-weighted Monte-Carlo estimate of $\\Lambda$.\n",
    "\n",
    "The LVK $p_\\mathrm{draw}$ has been chosen to be approximately representative of the observed BBH population observed in O3a.  [Farr (2019)](https://ui.adsabs.harvard.edu/abs/2019RNAAS...3...66F/abstract) gives formulas for estimating the uncertainty in this estimate of $\\Lambda$, and gives a threshold below which the computation is sufficiently accurate to avoid bias in population inference.  We reproduce them here (Eqs. (8) and (9)):\n",
    "$$\n",
    "\\mu \\equiv \\frac{1}{N_\\mathrm{draw}} \\sum_{f_i > f_\\mathrm{th}} \\frac{1}{p_\\mathrm{draw}\\left( \\theta_i \\right)} \\frac{\\mathrm{d} N}{\\mathrm{d} \\theta_i}\\left(\\lambda \\right)\n",
    "$$\n",
    "and\n",
    "$$\n",
    "\\sigma^2 \\equiv \\frac{\\mu^2}{N_\\mathrm{eff}} \\equiv \\frac{1}{N_\\mathrm{draw}^2} \\sum_{f_i > f_\\mathrm{th}} \\left[\\frac{1}{p_\\mathrm{draw}\\left( \\theta_i \\right)} \\frac{\\mathrm{d} N}{\\mathrm{d} \\theta_i}\\left(\\lambda \\right)\\right]^2 - \\frac{\\mu^2}{N_\\mathrm{draw}};\n",
    "$$\n",
    "under repeated samplings, and for sufficiently large draws in each sampling, estimates of $\\Lambda$ will be approximately Normally distributed with mean $\\mu$ and variance $\\sigma^2$.  The $N_\\mathrm{eff}$ parameter measures the \"effective sample size\" of the injection suite for each particular choice of population parameters $\\lambda$ (this will generally be larger for populations similar to the distribution from which the injections have been drawn because the variance of the terms in the sum above will be smaller).\n",
    "\n",
    "The particular draw distribution chosen for O3a is \n",
    "$$\n",
    "p\\left( m_1 \\right) \\propto m_1^{-2.35},\n",
    "$$\n",
    "$$\n",
    "p\\left( m_2 \\mid m_1 \\right) \\propto m_2^2,\n",
    "$$\n",
    "$$\n",
    "p\\left( z \\right) \\propto \\left(1 + z \\right)^2 \\times \\left[ \\frac{\\mathrm{d} V_C}{\\mathrm{d} z} \\frac{1}{1+z} \\right],\n",
    "$$\n",
    "and\n",
    "$$\n",
    "p\\left( s_{1,z} \\right) = p\\left( s_{2,z} \\right) \\propto \\mathrm{const},\n",
    "$$\n",
    "for masses $2 \\, M_\\odot \\leq m_2 \\leq m_1 \\leq 100 \\, M_\\odot$, redshift $0 \\leq z \\leq 2.3$, and orbit-aligned dimensionless spin parameters $-1 \\leq s_{1,2,z} \\leq 1$ (the injections have zero spin components perpindicular to the orbital plane; they are \"aligned spin\").  *Once again: the properly-normalized density, \n",
    "$$\n",
    "p_\\mathrm{draw}\\left( m_1, m_2, z, s_{1,z}, s_{2,z} \\right) = p\\left( m_1 \\right) p\\left( m_2 \\mid m_1 \\right) p\\left( z \\right) p\\left( s_{1,z} \\right) p\\left( s_{2,z} \\right)\n",
    "$$ \n",
    "is reported along with the injection parameters and pipeline IFAR values, so the necessary values to compute the importance-weighted estimate of $\\Lambda$---except for the intensity $\\mathrm{d}N/\\mathrm{d}\\theta$, which must be supplied by the consumer in any case---are all present in the released LVK data product.*  \n",
    "\n",
    "The LVK data product also gives the total amount of \"detector time\" during which injections were produced; injections are distributed uniformly in time in the detector frame, so an additional multiplicative factor of $T_\\mathrm{inj}$ should appear in the estimate of $\\Lambda$.  The Poisson intensity carries an additional---often implicit---factor of \"per observer time;\" so the draw density should also have a factor of \"per observer time,\" which is $1/T$ if the injections are uniform in observer time.  The result is to multiply the importance-weighted estimate by $T$.  This factor is not included in the LVK $p_\\mathrm{draw}$ values, but is released in the data product; see below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Sensitivity Data Product"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we load the samples from the sensitivity data product and make some basic plots with them.  First, the structure of the HDF5 file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['injections']\n",
      "['distance', 'gps_time_int', 'ifar_gstlal', 'ifar_pycbc_bbh', 'ifar_pycbc_full', 'mass1_source', 'mass2_source', 'optimal_snr_h', 'optimal_snr_l', 'redshift', 'sampling_pdf', 'spin1z', 'spin2z']\n",
      "[('N_exp/R(z=0)', 979.0426119706958), ('analysis_time_s', 15843600), ('max_redshift', 2.3), ('n_accepted', 156878), ('n_rejected', 76825621), ('surveyed_VT_Gpc_yr', 167.60465680916), ('total_generated', 76982499)]\n"
     ]
    }
   ],
   "source": [
    "with h5py.File('o3a_bbhpop_inj_info.hdf', 'r') as f:\n",
    "    print(list(f.keys()))\n",
    "    print(list(f['injections'].keys()))\n",
    "    print(list(f.attrs.items()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The defined attributes are:\n",
    "\n",
    "* `N_exp/R(z=0)` gives the normalizing constant for the redshift part of the distribution, equivalent to the number of detections expected for the merger intensity (all quantities in the source frame)\n",
    "$$\n",
    "\\frac{\\mathrm{d} N}{\\mathrm{d} m_1 \\mathrm{d} m_2 \\mathrm{d} V_c \\mathrm{d} t} = R\\left( z = 0 \\right) p\\left( m_2 \\mid m_1 \\right) p\\left( m_1 \\right) \\left( 1 + z \\right)^2,\n",
    "$$\n",
    "  where the mass distribution integrates to 1 over the allowable range of masses and $R$ is the merger rate at $z = 0$.\n",
    "* `analysis_time_s` gives the total (detector frame) searched time.\n",
    "* `max_redshift` gives the maximum redshift over which injections have been performed.\n",
    "* `n_accepted` gives the number of injections accepted after a first very conservative SNR cut (the injection campaign proceeded in two stages for computational efficiency: the first cut \"hopeless\" injections based on SNR and the second pushed the remaining injections through the detection pipeline).\n",
    "* `n_rejected` gives the number of injections judged \"hopeless\" in the first cut and therefore never pushed through the pipeline.\n",
    "* `surveyed_VT_Gpc_yr` gives the total volume \"surveyed\" by the injections passing the first cut.\n",
    "* `total_generated` is the total number of injections drawn from the reference distribution (= `n_accepted` + `n_rejected`).\n",
    "\n",
    "The datasets in the `injections` group are \n",
    "\n",
    "* `distance` gives the luminosity distance in Mpc for each non-hopeless injection.\n",
    "* `gps_time_int` gives the integer seconds of the GPS time for the injection.\n",
    "* `ifar_gstlal` gives the inverse false alarm rate (in years) for that injection's recovery in the `gstlal` pipeline.\n",
    "* `ifar_pycbc_bbh` gives the IFAR (years) in the BBH-specific `PyCBC` pipeline.\n",
    "* `ifar_pycbc_full` gives the IFAR (years) in the general `PyCBC` pipeline.\n",
    "* `mass1_source` and `mass2_source` give the source-frame masses for the injection (in solar masses).\n",
    "* `optimal_snr_h` and `optimal_snr_l` give the optimal (face-on, overhead source) SNR in the H1 and L1 detector using a representative noise spectral density (these were used to define the \"hopeless\" cut).\n",
    "* `redshift` gives the injection redshift corresponding to `distance`.\n",
    "* `sampling_pdf` gives the value of the normalized PDF $p\\left( m_1, m_2, z, s_{1,z}, s_{2,z} \\right)$ from which the injections are drawn evaluated at each injection's parameters.\n",
    "* `spin1z` and `spin2z` give the dimensionless orbit-aligned spin components for each injection."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Estimating Expected Detections in O3a from O1/O2 Population Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To demonstrate the use of these samples, we will compute the expected number of detections in O3a of a BBH population similar to the ones found to be a good fit to the O1/O2 observations in [Abbott, et al. (2019)](https://ui.adsabs.harvard.edu/abs/2019ApJ...882L..24A/abstract).  We use the power-law models in mass with a cutoff at low and high mass, an evolving redshift distribution, and assume a uniform distribution in the aligned spin components; see particularly Section 4.2 and Figure 6 for the discussion of redshift evolution.\n",
    "\n",
    "Since it is common in population analysis, we work with the log of $\\mathrm{d}N/\\mathrm{d}\\theta$.\n",
    "\n",
    "The ultimate population model is (all quantities in the comoving frame)\n",
    "$$\n",
    "\\frac{\\mathrm{d} N}{\\mathrm{d} m_1 \\mathrm{d} m_2 \\mathrm{d} V \\mathrm{d} t} = \n",
    "\\begin{cases}\n",
    "\\frac{R}{C\\left( \\alpha, \\beta \\right)} m_1^{-\\alpha} m_2^\\beta \\left( 1 + z \\right)^\\gamma & M_\\mathrm{min} < m_2 < m_1 < M_\\mathrm{max} \\\\\n",
    "0 & \\mathrm{otherwise}\n",
    "\\end{cases}\n",
    "$$\n",
    "where $C\\left(\\alpha, \\beta\\right)$ is a normalizing constant ensuring that the integral over all masses is unity, so that \n",
    "$$\n",
    "\\frac{\\mathrm{d} N}{\\mathrm{d} V \\mathrm{d} t} = \\int \\mathrm{d} m_1 \\, \\mathrm{d} m_2 \\, \\frac{\\mathrm{d} N}{\\mathrm{d} m_1 \\mathrm{d} m_2 \\mathrm{d} V \\mathrm{d} t} = R \\left( 1 + z \\right)^\\gamma.\n",
    "$$\n",
    "Thus we see that $R$ is the merger rate of BBHs at all masses at $z = 0$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "R0 = 20.0 # per Gpc^3 per yr\n",
    "\n",
    "alpha = 1.8 # primary mass pl slope\n",
    "beta = 0 # secondary mass pl slope\n",
    "gamma = 2.7 # redshift slope\n",
    "\n",
    "mmin = 5 # Solar masses\n",
    "mmax = 41 # Solar masses\n",
    "\n",
    "def log_dNdm1dm2dzdsz2(m1, m2, z, s1z, s2z):\n",
    "    m1_norm = (1-alpha)/(mmax**(1-alpha) - mmin**(1-alpha))\n",
    "    m2_norm = (1+beta)/(m1**(1+beta) - mmin**(1+beta))\n",
    "    \n",
    "    log_pm1 = -alpha*log(m1) + log(m1_norm)\n",
    "    log_pm2 = beta*log(m2) + log(m2_norm)\n",
    "    \n",
    "    # Here we convert from dN/dVdt in the comoving frame to dN/dzdt in the detector frame:\n",
    "    #\n",
    "    # dN/dzdt = dN/dVdt * dV/dz * 1/(1+z)\n",
    "    #\n",
    "    # The first factor is the Jacobian between comoving volume and redshift \n",
    "    # (commonly called the \"comoving volume element\"), and the second factor accounts for \n",
    "    # time dilation in the source frame relative to the detector frame.\n",
    "    log_dNdV = log(R0) + gamma*log1p(z)\n",
    "    log_dVdz = log(4*pi) + log(Planck15.differential_comoving_volume(z).to(u.Gpc**3/u.sr).value)\n",
    "    log_time_dilation = -log1p(z)\n",
    "    log_dNdz = log_dNdV + log_dVdz + log_time_dilation\n",
    "    \n",
    "    log_p_sz = log(0.25) # 1/2 for each spin dimension\n",
    "    \n",
    "    return np.where((m2 < m1) & (mmin < m2) & (m1 < mmax), log_pm1 + log_pm2 + log_p_sz + log_dNdz, np.NINF)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we implement the above formulas for (the log of) $\\mu$ and $N_\\mathrm{eff}$, using as our detection threshold that any of the three searches (`gstlal` and the `PyCBC` targeted BBH and un-targeted general) found the event with a false alarm rate that is below (better than) $1 / \\mathrm{yr}$---an \"IFAR\" (inverse FAR) greater than 1.  This is the same threshold used in [Abbott, et al. (2020](https://dcc.ligo.org/LIGO-P2000077/public)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "with h5py.File('o3a_bbhpop_inj_info.hdf', 'r') as f:\n",
    "    Tobs = f.attrs['analysis_time_s']/(365.25*24*3600) # years\n",
    "    Ndraw = f.attrs['total_generated']\n",
    "    \n",
    "    m1 = array(f['injections/mass1_source'])\n",
    "    m2 = array(f['injections/mass2_source'])\n",
    "    z = array(f['injections/redshift'])\n",
    "    s1z = array(f['injections/spin1z'])\n",
    "    s2z = array(f['injections/spin2z'])\n",
    "    \n",
    "    p_draw = array(f['injections/sampling_pdf'])\n",
    "    \n",
    "    gstlal_ifar = array(f['injections/ifar_gstlal'])\n",
    "    pycbc_ifar = array(f['injections/ifar_pycbc_full'])\n",
    "    pycbc_bbh_ifar = array(f['injections/ifar_pycbc_bbh'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "one_year = 1.0 # IFAR measured in years\n",
    "detection_selector = (gstlal_ifar > one_year) | (pycbc_ifar > one_year) | (pycbc_bbh_ifar > one_year)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we select only the detected population according to our threshold using `np.where`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-4-413a3df91e2f>:15: RuntimeWarning: invalid value encountered in log\n",
      "  log_pm2 = beta*log(m2) + log(m2_norm)\n"
     ]
    }
   ],
   "source": [
    "log_dN = np.where(detection_selector, log_dNdm1dm2dzdsz2(m1, m2, z, s1z, s2z), np.NINF)\n",
    "log_mu = log(Tobs) + logaddexp.reduce(log_dN - log(p_draw)) - log(Ndraw)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This population model would predict that the expected number of detections in O3a is 28.59\n"
     ]
    }
   ],
   "source": [
    "print('This population model would predict that the expected number of detections in O3a is {:.2f}'.format(exp(log_mu)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given that we saw ~40 detections in O3a, this population model is reasonable enough.\n",
    "\n",
    "What is the uncertainty on the above estimate due to the importance weighted Monte-Carlo integral?  Below we compute it following the formulas above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def logdiffexp(x, y):\n",
    "    return x + np.log1p(exp(y-x))\n",
    "\n",
    "log_s2 = 2.0*log(Tobs) + logaddexp.reduce(2.0*log_dN - 2.0*log(p_draw)) - 2.0*log(Ndraw)\n",
    "log_sigma2 = logdiffexp(log_s2, 2.0*log_mu - log(Ndraw))\n",
    "Neff = exp(2.0*log_mu - log_sigma2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uncertainty on sensitivity estimate equivalent to 13270.0 Monte-Carlo samples\n",
      "Or, equivalently, a fractional uncertainty on mean detections in O3a of 0.0087\n",
      "Or, equivalently, that the expected number of detections in O3a assuming this population is 28.59 +/- 0.25\n"
     ]
    }
   ],
   "source": [
    "print('Uncertainty on sensitivity estimate equivalent to {:.1f} Monte-Carlo samples'.format(Neff))\n",
    "print('Or, equivalently, a fractional uncertainty on mean detections in O3a of {:.4f}'.format(1/sqrt(Neff)))\n",
    "print('Or, equivalently, that the expected number of detections in O3a assuming this population is {:.2f} +/- {:.2f}'.format(exp(log_mu), exp(log_mu)/sqrt(Neff)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The uncertainty on this estimate is equivalent to a Monte-Carlo integration with $\\sim 13$k independent samples (note helpful function `logdiffexp` that computes accurately $\\log\\left( \\exp(x) - \\exp(y) \\right)$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}