The Anatomy of a Learning Algorithm

Goal

Why are you learning?

The purpose of learning is sure to effect how much resources you devote to the task and how completely you achieve your learning goals. However, your goal is also a powerful constraint. Your representation system must be sufficient enough to achieve your goal but there is no guarantee you learn a representation sufficient enough for an alternative goal. For example, learning how to press flowers will not necessarily help you prepare them for medicinal purposes.

Model

Input

What information is your model considering?

Your model will never learn from something it cannot detect. For example, a model of the temperature based on several distantly placed thermometers will never learn how the sun influences temperature. There may always be some unknown input driving your model. Similarly, the thermometer model might be an excellent model of temperature because the thermometers were perfectly placed to capture differential amounts of sunlight as the sun moves across the sky. In this case, the thermometer model may not generalize to other instantiations where the input thermometers don’t perfectly capture a sunlight differential.

Output

What responses are allowed?

This might sound weird but you determine the possible answers the model can give you and you might lose information by limiting your options. For example, let’s consider a model of magnetism that observes different kinds of objects attracting, repelling or having no effect on each other. The model is designed to predict whether a given object is a magnet or inert, i.e., a binary decision. An object is a magnet if it both attracts AND repels to a known magnet; otherwise, it is classified as inert.

While this output representation might be fine for identifying magnets, it’s actually mis-specified. In addition to magnets and inert objects, an object can be ferromagnetic, i.e., attracted to a known magnet (e.g., magnets on the icebox). The real problem is that ``inert'' is not an appropriate label for ferromagnetic objects. In general, it can be hard to know how many and what output categories to use in a classification problem.

Hypothesis Space

What mappings between input and output are possible?

Every model specifies a set of possible mappings, or functions, between input and output. Different frameworks, theories and specifications often place constraints on the hypothesis space, limiting the possible mappings to learn how constraints might influence behavior via learned representations. The required and recommended Andy Perfors readings next week will dive a little deeper into hypothesis spaces.

Inductive Bias

How does the model behave when there’s no data?

An inductive bias specifies how the model should process/use the hypothesis space when there’s no data. Should all of the hypotheses be treated as equally likely? Is there a clear order in which to start ruling out hypotheses? Alphabetically?

But also, even after seeing some data, how do we generalize to data that we haven’t seen? Remember from Week 0’s reading; we have strong intuitions about which fantastical beasts are more likely to exist and to have certain properties. There may be several hypotheses that would govern a generalization. For example, maybe mermaids lay eggs because they have scales like a fish. Or maybe mermaids have live birth because they have hair like mammals. Or maybe mermaids have live birth because unlike other amphibians, mermaids don’t have four limbs. The world may never know \_o_/ Those poor Mako mermaids!

Update Rules

How does the model change as you observe more data?

It is widely acknowledged that models are trained or learn from data. What exactly does that entail? Some models only change when their prediction is wrong, or VERY WRONG. Some models change as they see more data even when they were performing well. Some models get it wrong and yeet way too hard the other way. Other models say they’re updating but do so slower than snails. Some models change randomly when they see new data. Other models change in a particular increment. You can only really grok a model when you know how it will perform when it sees new things.

Environment

What constrains the data?

The environment is often seen as outside of the scope of a learning mechanism; however, constraints on the data can have just as much of an effect and are just as important as constraints elsewhere in the learning algorithm. While the model may be able to receive features as input, if the data are censored to avoid those inputs the model may never learn about them. If the data was curated with specific biases, those biases can be imbued upon the model. Therefore, it’s important to take a breather and really consider the properties of your training data.

Is the training environment comparable to the environment you hope to achieve your goal in?

The Boston Dynamics group can train as many robots to dance on Earth as they want but will it work 20,000 leagues under the sea or on the moon?