---
title: "Computational Cognitive Science 2022-2023"
output:
  pdf_document: default
  word_document: default
  html_document: default
---
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
require(DirichletReg) # We'll be using Dirichlet distributions. gtools includes rdirichlet and ddirichlet but not with log densities.
```

# Tutorial 5: Active learning

This tutorial provides some hands-on exposure to concepts from the active learning lectures.

Imagine you come across a curiosity shop that has four devices for sale at £100 each. Each device can automatically create a delicious cake when one pulls its lever, but the devices vary in their reliability:

* one device never works, 
* one device works exactly 20 percent of the time,
* one works 30 percent of the time, and 
* one works 75 percent of the time.

We don't know which device is which, but the shopkeeper will let us use the devices a few times to choose one.

Let's take an information-theoretic approach to this active learning problem.

Here are some helper functions.

```{r}
plogp <- function(p) {
  # bits, not nats
  if(p==0) 0 else p*log2(p) # Consider why we can't just use p*log2(p).
}

cat_entropy <- function(catv) {
  -sum(sapply(catv,plogp))
}
```

Let's define our hypothesis space as all of the different cake probabilities the machines before us might have, e.g., c(0,.2,.3,.75) means the machines increase in their probabilities from left to right, and c(.75,.3,.2,0) means they decrease from left to right.

**Question 1**: How does the code below define that hypothesis space?

```{r}
require(combinat) # From googling "all list permutations in R"
probs <- c(0,.2,.3,.75)
hyps <- permn(probs)
```

**Question 2**: What is our entropy, assuming we know nothing about the order of the machines?

```{r}
prior <- c(0) # Replace this with a prior over all hypotheses 
cat_entropy(prior)
```

**Question 3**: Suppose we turn the lever on the first machine and get nothing. What probability should we now assign our different hypotheses? Edit the code below.

```{r}
update_probs <- function(prior,choice,outcome) {
  # x, below, is a list of probabilities under a specific hypothesis
  
  likelihood <- function(this_hyp,this_choice,this_outcome) {
    1.0 # Replace this 1.0 with an actual likelihood
  }
  
  likes <- sapply(hyps,function(h) {likelihood(h,choice,outcome)}) 
  unnp <- likes*prior
  z <- sum(unnp)
  unnp/z
}

post <- update_probs(prior,choice=1,outcome=0)
print(post)
```

**Question 4**: What is our new entropy? Did we learn much? Why or why not?

**Question 5**: What is the expected probability getting a cake if we choose machine 2 next?

```{r}
# First we want the probability of each outcome in each hyp
# Then we take the weighted sum over hypothesis probabilities
expected_outcome <- function(i,ph) {
  0 # Replace this
}
print(expected_outcome(2,post))
```

**Question 6**: What is the expected entropy if we choose machine 2 next?

```{r}
expected_entropy <- function(choice,p_hyps) {
  pwin <- expected_outcome(choice,p_hyps)
  0 # Replace this
}
print(expected_entropy(2,post))
```

**Question 7**: Suppose we do choose machine 2 and it yields a cake. What is our current entropy and what are the expected entropies after each of our next possible choices?

**Question 8**: What does this imply we should do next?

**Question 9**: Is this at all counter-intuitive? What might we do differently if we are only interested in knowing which is the most reliable machine?