PyMC & Inference Data¶

Created by Christian Luhmann

  • xarray
  • Example Data
  • Unlabeled Approach (Old)
  • Labeled Approach (New)
  • Full Example
  • Wrap-Up
In [1]:
import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import xarray as xr
%matplotlib inline

az.style.use("arviz-darkgrid")
rng = np.random.default_rng(101010)
/home/xian/anaconda3/envs/pymc-dev-py39/lib/python3.9/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: main is an invalid version and will not be supported in a future release
  warnings.warn(

PyMC v4 is here and one of the big changes is that the inference routines (e.g., pm.sample()) will return an ArviZ InferenceData object by default (recent releases of PyMV3 have made this optional). This post is intended to explore a bit about how the InferenceData object works, how it intersects with PyMC parameter arrays and how the labeling of dimensions and coordinates can make life easier. If you haven't already checked out Oriol Abril's fantastic post about PyMC, coords, and dims, I would strongly encourage you to do so.

Oriol's post focuses on the use of labeled dimensions, how these dimensions can be used to create parameter arrays, and how Arviz acknowledges and utilizes the labels when processing the prior/posterior (e.g., plotting). Here, I want to focus on how named dimensions/cooridnates can be used to create parameter arrays, contrasting it with the way things were done in PyMC3. I will also illustrate how to access information in the posterior and compare labeled dimensions with unlabeled.

xarray ¶

The new(ish) ArviZ inferenceData object is basically an xarray Dataset object. xarray describes itself as a generalization of both Numpy arrays and Pandas dataframes. Its design also borrows heavily from the netCDF file specification. But let's see what xarray objects actually look like. Here is a standard numy array.

In [2]:
npdata = rng.standard_normal(size=(2, 3))
npdata
Out[2]:
array([[-0.91412107, -0.32402107,  1.29165606],
       [ 0.52327975,  0.95708885,  0.52139111]])

Here we will use this numpy array to create an xarray DataArray.

In [3]:
data = xr.DataArray(npdata, dims=("user", "day"), coords={"user": ['Alice', 'Bob'], "day":["yesterday", "today", "tomorrow"]})
data
Out[3]:
<xarray.DataArray (user: 2, day: 3)>
array([[-0.91412107, -0.32402107,  1.29165606],
       [ 0.52327975,  0.95708885,  0.52139111]])
Coordinates:
  * user     (user) <U5 'Alice' 'Bob'
  * day      (day) <U9 'yesterday' 'today' 'tomorrow'
xarray.DataArray
  • user: 2
  • day: 3
  • -0.9141 -0.324 1.292 0.5233 0.9571 0.5214
    array([[-0.91412107, -0.32402107,  1.29165606],
           [ 0.52327975,  0.95708885,  0.52139111]])
    • user
      (user)
      <U5
      'Alice' 'Bob'
      array(['Alice', 'Bob'], dtype='<U5')
    • day
      (day)
      <U9
      'yesterday' 'today' 'tomorrow'
      array(['yesterday', 'today', 'tomorrow'], dtype='<U9')

As you can see, the data hasn't changed at all, but it now comes with all sorts of extra information. Let's get a bit of terminology out of the way. Our xarray object is an DataArray. The array has 2 dimensions: one labeled "user" and the other labeled "day".

In [4]:
data.dims
Out[4]:
('user', 'day')

Each dimension has several coordinates. The "user" dimension includes coordinates "Alice" or "Bob". The "day" dimension includes coordinates "yesterday", "today", and "tomorrow".

In [5]:
data.coords
Out[5]:
Coordinates:
  * user     (user) <U5 'Alice' 'Bob'
  * day      (day) <U9 'yesterday' 'today' 'tomorrow'

These coordinates can be used instead of standard numerical indicies. Here is how we select the value corresponding to "Alice" and "yesterday" from our numpy array.

In [6]:
npdata[0,0]
Out[6]:
-0.9141210682943073

And here is how we select that same value from our xarray DataArray.

In [7]:
data.sel(user="Alice", day="yesterday")
Out[7]:
<xarray.DataArray ()>
array(-0.91412107)
Coordinates:
    user     <U5 'Alice'
    day      <U9 'yesterday'
xarray.DataArray
  • -0.9141
    array(-0.91412107)
    • user
      ()
      <U5
      'Alice'
      array('Alice', dtype='<U5')
    • day
      ()
      <U9
      'yesterday'
      array('yesterday', dtype='<U9')

So one benefit of xarray objects is that there are a nice, semantically labeled coordinates. You no longer have to remember that npdata[:,0] corresponds to "yesterday" and that npdata[1,:] corresponds to "Bob". But it's even better than that. You don't even need to remember which axis is which. The order in which the dimensions are stored is not relevant because you are just using the dimensions' labels (though you can use numerical indexing if you wish).

In [8]:
data.sel(day="yesterday", user="Alice")
Out[8]:
<xarray.DataArray ()>
array(-0.91412107)
Coordinates:
    user     <U5 'Alice'
    day      <U9 'yesterday'
xarray.DataArray
  • -0.9141
    array(-0.91412107)
    • user
      ()
      <U5
      'Alice'
      array('Alice', dtype='<U5')
    • day
      ()
      <U9
      'yesterday'
      array('yesterday', dtype='<U9')
In [9]:
data.sel(user="Alice", day="yesterday")
Out[9]:
<xarray.DataArray ()>
array(-0.91412107)
Coordinates:
    user     <U5 'Alice'
    day      <U9 'yesterday'
xarray.DataArray
  • -0.9141
    array(-0.91412107)
    • user
      ()
      <U5
      'Alice'
      array('Alice', dtype='<U5')
    • day
      ()
      <U9
      'yesterday'
      array('yesterday', dtype='<U9')

Finally, xarray also permits the creation of objects that include multiple DataArray objects. xarray calls such an object a Dataset. This is essentially what we will be dealing with when we get an ArviZ inferenceData object back after sampling.

In [10]:
var2 = xr.DataArray(rng.standard_normal(size=(2, 2)),
                    dims=("x", "y"),
                    coords={"x": [0,1], "y":[11,42]}
                   )
var3 = xr.DataArray(rng.standard_normal(size=(2, 2)),
                    dims=("a", "b"),
                    coords={"a": [4.2, 11.8], "b":['Geneva','London']}
                   )
ds = xr.Dataset(dict(orig=data, v2=var2, v3=var3))
ds
Out[10]:
<xarray.Dataset>
Dimensions:  (user: 2, day: 3, x: 2, y: 2, a: 2, b: 2)
Coordinates:
  * user     (user) <U5 'Alice' 'Bob'
  * day      (day) <U9 'yesterday' 'today' 'tomorrow'
  * x        (x) int64 0 1
  * y        (y) int64 11 42
  * a        (a) float64 4.2 11.8
  * b        (b) <U6 'Geneva' 'London'
Data variables:
    orig     (user, day) float64 -0.9141 -0.324 1.292 0.5233 0.9571 0.5214
    v2       (x, y) float64 -0.2851 -1.626 -0.08508 0.9506
    v3       (a, b) float64 -0.4798 1.463 0.07777 0.09208
xarray.Dataset
    • user: 2
    • day: 3
    • x: 2
    • y: 2
    • a: 2
    • b: 2
    • user
      (user)
      <U5
      'Alice' 'Bob'
      array(['Alice', 'Bob'], dtype='<U5')
    • day
      (day)
      <U9
      'yesterday' 'today' 'tomorrow'
      array(['yesterday', 'today', 'tomorrow'], dtype='<U9')
    • x
      (x)
      int64
      0 1
      array([0, 1])
    • y
      (y)
      int64
      11 42
      array([11, 42])
    • a
      (a)
      float64
      4.2 11.8
      array([ 4.2, 11.8])
    • b
      (b)
      <U6
      'Geneva' 'London'
      array(['Geneva', 'London'], dtype='<U6')
    • orig
      (user, day)
      float64
      -0.9141 -0.324 ... 0.9571 0.5214
      array([[-0.91412107, -0.32402107,  1.29165606],
             [ 0.52327975,  0.95708885,  0.52139111]])
    • v2
      (x, y)
      float64
      -0.2851 -1.626 -0.08508 0.9506
      array([[-0.28513891, -1.62619068],
             [-0.08508324,  0.95058622]])
    • v3
      (a, b)
      float64
      -0.4798 1.463 0.07777 0.09208
      array([[-0.47979087,  1.46327566],
             [ 0.07776975,  0.09207604]])

Note that we have 6 dimensions (user, day, x, y, a, and b), but that each variable in our Dataset only uses 2 of them.

To get access to each individual variable out as a DataArray, we can use dictionary-style indexing:

In [11]:
ds['orig']
Out[11]:
<xarray.DataArray 'orig' (user: 2, day: 3)>
array([[-0.91412107, -0.32402107,  1.29165606],
       [ 0.52327975,  0.95708885,  0.52139111]])
Coordinates:
  * user     (user) <U5 'Alice' 'Bob'
  * day      (day) <U9 'yesterday' 'today' 'tomorrow'
xarray.DataArray
'orig'
  • user: 2
  • day: 3
  • -0.9141 -0.324 1.292 0.5233 0.9571 0.5214
    array([[-0.91412107, -0.32402107,  1.29165606],
           [ 0.52327975,  0.95708885,  0.52139111]])
    • user
      (user)
      <U5
      'Alice' 'Bob'
      array(['Alice', 'Bob'], dtype='<U5')
    • day
      (day)
      <U9
      'yesterday' 'today' 'tomorrow'
      array(['yesterday', 'today', 'tomorrow'], dtype='<U9')

xarray provides loads of indexing options, dimension/coordinate-aware computation, split-apply-combine operations, and much more. Go read up on xarray to discover all of its functionality.

Example Data ¶

In the examples below, we'll be working with some synthetic data involving two groups: one group receiving an experimental medication and one group receiving a placebo. For each patient, we have their age, sex, systolic blood pressure, and a flag indicating which treatment they have received. Each patient also has a unique ID. Let's generate it.

In [12]:
group_size = 100
systolic_f = rng.normal(loc=123, scale=5, size=group_size)
systolic_m = rng.normal(loc=127, scale=5, size=group_size)
systolic = np.hstack((systolic_f, systolic_m))
sex = (['female'] * group_size) + (['male'] * group_size)
age = rng.normal(loc=50, scale=7, size=len(systolic))
treat_group = rng.integers(low=1, high=3, size=len(systolic))
patient_id = rng.choice(1000, size=len(systolic), replace=False)
systolic = systolic + (.05 * age)
df = pd.DataFrame({'bp':systolic,
                   'sex':sex,
                   'age':age,
                   'treat_group':treat_group})
df.index = patient_id
df.index.name = "patientID"
df.head()
Out[12]:
bp sex age treat_group
patientID
979 128.750951 female 53.901474 2
924 123.935881 female 50.596643 1
522 121.683066 female 57.149881 1
912 135.144287 female 50.263664 2
787 121.420322 female 47.802665 1

Old Method ¶

Let's construct a model in which we estimate the mean of each group. But let's do it the way we might have in PyMC3, using an unlabeled parameter array. Because we need to index into the parameter array numerically, the first thing we have to do is to use panda's categorical data type to extract category codes (e.g., 0,1,2...) that numerically indicate group membership.

In [13]:
sex_idx,sex_codes = pd.factorize(df["sex"])
n_sex = len(sex_codes)
In [14]:
with pm.Model() as unlabeled_model:
    # hyper prior
    mu_hyper = pm.Normal('mu_hyper', mu=120, sigma=15)
    # per-group prior
    mu = pm.Normal('mu', mu=mu_hyper, sigma=15, shape=n_sex)
    # likelihood
    likelihood = pm.Normal('likelihood', mu=mu[sex_idx], sigma=15, observed=df['bp'])
In [15]:
with unlabeled_model:
    unlabeled_idata = pm.sample()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [mu_hyper, mu]
100.00% [4000/4000 00:02<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 3 seconds.
In [16]:
unlabeled_idata.posterior
Out[16]:
<xarray.Dataset>
Dimensions:   (chain: 2, draw: 1000, mu_dim_0: 2)
Coordinates:
  * chain     (chain) int64 0 1
  * draw      (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999
  * mu_dim_0  (mu_dim_0) int64 0 1
Data variables:
    mu_hyper  (chain, draw) float64 123.0 114.9 126.2 ... 130.0 123.8 125.5
    mu        (chain, draw, mu_dim_0) float64 124.9 129.1 124.0 ... 125.9 127.9
Attributes:
    created_at:                 2022-06-04T19:46:36.434190
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.0.0b6
    sampling_time:              3.3800315856933594
    tuning_steps:               1000
xarray.Dataset
    • chain: 2
    • draw: 1000
    • mu_dim_0: 2
    • chain
      (chain)
      int64
      0 1
      array([0, 1])
    • draw
      (draw)
      int64
      0 1 2 3 4 5 ... 995 996 997 998 999
      array([  0,   1,   2, ..., 997, 998, 999])
    • mu_dim_0
      (mu_dim_0)
      int64
      0 1
      array([0, 1])
    • mu_hyper
      (chain, draw)
      float64
      123.0 114.9 126.2 ... 123.8 125.5
      array([[123.00435007, 114.86059768, 126.22443518, ..., 106.6128855 ,
              123.30495539, 123.30495539],
             [120.54586625, 116.84453927, 137.36321143, ..., 130.00975918,
              123.78130594, 125.50826176]])
    • mu
      (chain, draw, mu_dim_0)
      float64
      124.9 129.1 124.0 ... 125.9 127.9
      array([[[124.85311034, 129.06437058],
              [124.04599364, 129.73002699],
              [124.74487894, 130.18939123],
              ...,
              [126.171595  , 130.55412392],
              [125.55071625, 129.97903848],
              [125.55071625, 129.97903848]],
      
             [[124.74847894, 129.09550215],
              [122.20259379, 129.8071807 ],
              [126.15732607, 129.2269007 ],
              ...,
              [126.18684081, 129.06940765],
              [124.92729948, 130.69099774],
              [125.88147002, 127.89700098]]])
  • created_at :
    2022-06-04T19:46:36.434190
    arviz_version :
    0.12.1
    inference_library :
    pymc
    inference_library_version :
    4.0.0b6
    sampling_time :
    3.3800315856933594
    tuning_steps :
    1000
In [17]:
az.summary(unlabeled_idata)
Out[17]:
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
mu_hyper 125.107 9.012 107.842 141.253 0.171 0.122 2773.0 1555.0 1.0
mu[0] 124.983 1.484 122.392 127.974 0.027 0.019 2898.0 1531.0 1.0
mu[1] 129.874 1.461 127.117 132.524 0.027 0.019 2902.0 1537.0 1.0

We can also investigate the posterior directly. Let's calculate the means of each group.

In [18]:
# average over chains and draws
unlabeled_idata.posterior['mu'].mean(dim=['chain', 'draw'])
Out[18]:
<xarray.DataArray 'mu' (mu_dim_0: 2)>
array([124.98313775, 129.87418843])
Coordinates:
  * mu_dim_0  (mu_dim_0) int64 0 1
xarray.DataArray
'mu'
  • mu_dim_0: 2
  • 125.0 129.9
    array([124.98313775, 129.87418843])
    • mu_dim_0
      (mu_dim_0)
      int64
      0 1
      array([0, 1])

More practically, we can calculate the probability that the blood pressures of the males and females differ from one another.

In [19]:
( unlabeled_idata.posterior['mu'].sel(mu_dim_0=0) < 
  unlabeled_idata.posterior['mu'].sel(mu_dim_0=1) ).mean()
Out[19]:
<xarray.DataArray 'mu' ()>
array(0.993)
xarray.DataArray
'mu'
  • 0.993
    array(0.993)

    Ok. We have our posterior and everything seems to be working. But dealing with the posterior is cumbersome. In particular, we have this annoying mu_dim_0. What is that? In our model, mu is a parameter array of size (2,), so it has one dimension. Because we have not labeled this dimension explicitly, it is automatically labeled for us in the inferenceData object: mu_dim_0. The two coordinates on this dimension are mu_dim_0=0 and mu_dim_0=1. Which of these two refers to the female mean and which refers to the male mean? Good question! It's not immediately clear. We would have to go back and see how we have handled our data. Looking back, we can check the categorical codes that we asked pandas to generate for our data['female'] variable.

    In [20]:
    sex_codes
    
    Out[20]:
    Index(['female', 'male'], dtype='object')
    In [21]:
    # first patient's index into the sex_codes map
    sex_idx[0]
    
    Out[21]:
    0

    So pandas has assigned the first patient an index of 0 and sex_codes[0]=='female', suggesting that the first patient is female. We can verify this by checking that the first patient's entry in the 'sex' column of our original data.

    In [22]:
    # first patient's value in the 'sex' column
    df['sex'].iloc[0]
    
    Out[22]:
    'female'

    So mu_dim_0=0 correponds to the females. Thus, mu_dim_0=1 must corresponds to the males. Got all that? No? It's not particularly easy. Let's see how labeled dimensions and coordinates can help.

    New Method ¶

    Let's begin by defining sex as a dimension and labeling it as such. We will simultaneously label the coordinates, or values, on this dimension.

    In [23]:
    coords = {'sex':['female','male']}
    

    Easy enough. What do we do with this? We pass it along to our PyMC model, which will allow us to use it when defining parameter arrays (among other things). Let's see.

    In [24]:
    with pm.Model(coords=coords) as labeled_model:
        # hyper prior
        mu_hyper = pm.Normal('mu_hyper', mu=120, sigma=15)
        # per-group prior
        mu = pm.Normal('mu', mu=mu_hyper, sigma=15, dims='sex')
        # likelihood
        likelihood = pm.Normal('likelihood', mu=mu[sex_idx], sigma=15, observed=df['bp'])
    

    Notice that we are again asking for an array of parameters, mu. But instead of asking for shape=n_grps, we now ask for dims='sex'. This tells PyMC that we want an array of a normally-distributed parameters, one for each coordinate on our 'sex' dimension.

    Just like in the previous example, we will index into this array using the sex_idx that pandas generated for us earlier. But how do the values in sex_idx correspond to the coordinates?

    In [25]:
    # a few patients' coordinate values
    patient_ids = [0,75,150]
    for idx in sex_idx[patient_ids]:
        print(coords['sex'][idx])
    
    female
    female
    male
    

    So we have verified that indicies in sex_idx map into our coordinates in the desired manner. Let's sample and see how our labeled dimensions and coordinates help us to unpack our posterior.

    In [26]:
    with labeled_model:
        labeled_idata = pm.sample()
    
    Auto-assigning NUTS sampler...
    Initializing NUTS using jitter+adapt_diag...
    Multiprocess sampling (2 chains in 2 jobs)
    NUTS: [mu_hyper, mu]
    
    100.00% [4000/4000 00:03<00:00 Sampling 2 chains, 0 divergences]
    Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 3 seconds.
    
    In [27]:
    az.summary(labeled_idata)
    
    Out[27]:
    mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
    mu_hyper 124.935 9.011 107.734 141.650 0.173 0.123 2728.0 1361.0 1.0
    mu[female] 125.016 1.464 122.195 127.528 0.030 0.021 2347.0 1394.0 1.0
    mu[male] 129.841 1.527 127.085 132.664 0.028 0.020 2989.0 1338.0 1.0

    We can already see that the multiple entries in our parameter array mu are now indexed by the meaningful labels, 'female' and 'male', so we don't have to wonder about which summary corresponds to which group. These labels are also useful when interrogating the InferenceData object itself.

    In [28]:
    # average over chains and draws
    labeled_idata.posterior['mu'].mean(dim=['chain', 'draw'])
    
    Out[28]:
    <xarray.DataArray 'mu' (sex: 2)>
    array([125.01577402, 129.84107401])
    Coordinates:
      * sex      (sex) <U6 'female' 'male'
    xarray.DataArray
    'mu'
    • sex: 2
    • 125.0 129.8
      array([125.01577402, 129.84107401])
      • sex
        (sex)
        <U6
        'female' 'male'
        array(['female', 'male'], dtype='<U6')

    Alternatively, we can pull out the two parameters manually, referring to sex and each of the two coordinates ('female' and 'male') intead of the unlabeled dimension mu_dim_0:

    In [29]:
    labeled_idata.posterior['mu'].sel(sex='female').mean()
    
    Out[29]:
    <xarray.DataArray 'mu' ()>
    array(125.01577402)
    Coordinates:
        sex      <U6 'female'
    xarray.DataArray
    'mu'
    • 125.0
      array(125.01577402)
      • sex
        ()
        <U6
        'female'
        array('female', dtype='<U6')
    In [30]:
    labeled_idata.posterior['mu'].sel(sex='male').mean()
    
    Out[30]:
    <xarray.DataArray 'mu' ()>
    array(129.84107401)
    Coordinates:
        sex      <U6 'male'
    xarray.DataArray
    'mu'
    • 129.8
      array(129.84107401)
      • sex
        ()
        <U6
        'male'
        array('male', dtype='<U6')

    And here we can compare the two posteriors:

    In [31]:
    ( labeled_idata.posterior['mu'].sel(sex='female') <
      labeled_idata.posterior['mu'].sel(sex='male') ).mean()
    
    Out[31]:
    <xarray.DataArray 'mu' ()>
    array(0.9915)
    xarray.DataArray
    'mu'
    • 0.9915
      array(0.9915)

      So more than 90% of the samples are consistent with the conclusion that the females' blood pressures were less than the males'.

      Full Example ¶

      The example we have been working with so far has been pretty simple in order to illustrate the general usage of az.InferenceData objects. Let's see an example that is slightly more elaborate and realistic.

      In [32]:
      group_idx,group_codes = pd.factorize(df['treat_group'])
      sex_idx,sex_codes = pd.factorize(df['sex'])
      
      coords = {"patientID": df.index}
      with pm.Model(coords=coords) as fullModel:
          # hyper prior
          intercept_group = pm.Normal('intercent_hyper', mu=120, sigma=5)
          # per-group prior
          intercept_patient = pm.Normal("intercept_patient", mu=intercept_group, sigma=15, dims="patientID")
          # individual coefficients
          b_sex = pm.Normal("b_sex", mu=0, sigma=5)
          b_age = pm.Normal("b_age", mu=0, sigma=5)
          b_treatment = pm.Normal("b_treatment", mu=0, sigma=5)
      
          # likelihood
          mu = intercept_patient + (b_sex * sex_idx) + (b_age * df['age'].to_numpy()) + (b_treatment * group_idx)
          likelihood = pm.Normal('likelihood', mu=mu, sigma=5, observed=df['bp'])
          
          idata = pm.sample(4000, tune=4000, target_accept=0.9, chains=4)
      
      Auto-assigning NUTS sampler...
      Initializing NUTS using jitter+adapt_diag...
      Multiprocess sampling (4 chains in 2 jobs)
      NUTS: [intercent_hyper, intercept_patient, b_sex, b_age, b_treatment]
      
      100.00% [32000/32000 02:02<00:00 Sampling 4 chains, 0 divergences]
      Sampling 4 chains for 4_000 tune and 4_000 draw iterations (16_000 + 16_000 draws total) took 123 seconds.
      
      In [33]:
      # inspect the descriptive statistics
      # omit the per-patient intercepts
      az.summary(idata, var_names=['~intercept_patient'])
      
      Out[33]:
      mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
      intercent_hyper 120.746 4.137 112.950 128.308 0.298 0.211 195.0 469.0 1.03
      b_sex 4.226 2.047 0.241 8.006 0.033 0.024 3798.0 6611.0 1.00
      b_age 0.092 0.085 -0.064 0.254 0.006 0.004 201.0 517.0 1.03
      b_treatment -0.082 2.018 -3.844 3.736 0.033 0.023 3745.0 6279.0 1.00
      In [34]:
      az.plot_posterior(idata, var_names=['~intercept_patient'], grid=(2,2));
      

      Wrap-Up ¶

      Hopefully this has given you a quick look into the advantages of using coordinates and dimensions and illustrated some basic, but illustrative scenarios. Head over to the PyMC Discourse if you have questions or need additional help!

      In [35]:
      %load_ext watermark
      %watermark -p pymc,arviz,numpy,pandas,matplotlib
      
      pymc      : 4.0.0b6
      arviz     : 0.12.1
      numpy     : 1.22.4
      pandas    : 1.4.2
      matplotlib: 3.5.2