Bozonier | Test Driven Machine Learning | E-Book | www.sack.de
E-Book

E-Book, Englisch, 190 Seiten

Bozonier Test Driven Machine Learning

Control your machine learning algorithms using test-driven development to achieve quantifiable milestones
1. Auflage 2025
ISBN: 978-1-78439-636-7
Verlag: De Gruyter
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)

Control your machine learning algorithms using test-driven development to achieve quantifiable milestones

E-Book, Englisch, 190 Seiten

ISBN: 978-1-78439-636-7
Verlag: De Gruyter
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)



Machine learning is the process of teaching machines to remember data patterns, using them to predict future outcomes, and offering choices that would appeal to individuals based on their past preferences.
Machine learning is applicable to a lot of what you do every day. As a result, you can't take forever to deliver your first iteration of software. Learning to build machine learning algorithms within a controlled test framework will speed up your time to deliver, quantify quality expectations with your clients, and enable rapid iteration and collaboration.
This book will show you how to quantifiably test machine learning algorithms. The very different, foundational approach of this book starts every example algorithm with the simplest thing that could possibly work. With this approach, seasoned veterans will find simpler approaches to beginning a machine learning algorithm. You will learn how to iterate on these algorithms to enable rapid delivery and improve performance expectations.
The book begins with an introduction to test driving machine learning and quantifying model quality. From there, you will test a neural network, predict values with regression, and build upon regression techniques with logistic regression. You will discover how to test different approaches to naïve bayes and compare them quantitatively, along with how to apply OOP (Object-Oriented Programming) and OOP patterns to test-driven code, leveraging SciKit-Learn.
Finally, you will walk through the development of an algorithm which maximizes the expected value of profit for a marketing campaign by combining one of the classifiers covered with the multiple regression example in the book.

Bozonier Test Driven Machine Learning jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


Chapter 2. Perceptively Testing a Perceptron


Even for people comfortable with using them, neural networks can seem like a big black box. On top of that, the little bit of randomness within them just makes their inner workings that much more mysterious.

In this chapter, we're going to start exploring TDD-ing machine learning algorithms by building a very simple neural network using TDD. Then we will use this as an opportunity to more deeply understand how they work.

In this chapter, we will cover the following topics:

  • Building the simplest perceptron possible
  • Using a spreadsheet to develop simple use cases we can test to and reproduce
  • Using TDD to develop our first machine-learning algorithm
  • Testing with datasets

You will need some sort of spreadsheet program to follow along with this chapter. Microsoft Excel, Libre Office, or Google Docs are completely fine.

Getting started


A perceptron is a binary linear classifier. Like other supervised learning techniques, we can feed in rows of data along with the appropriate classification. After enough of these, the perceptron can begin to label new rows of data that have yet to be classified. Specifically, a perceptron works by adjusting a hyperplane to separate two groups of data as accurately as possible (with a linear classifier). Said a bit more simply, that means that we will have some data in a space and then perturb something like a line until it can act as an arbiter of what fits in one classification or another.

If you want to visualize it, think of 2D data being separated by a line like so:

We'll be using TDD to develop the algorithm ourselves as an example of breaking very large problems down as much as possible. In later chapters, we will lean on third-party libraries for other implementations and focus on other ways to drive your machine learning forward in discrete steps.

It can be hard to figure out where to start creating a perceptron that doesn't include building the whole algorithm. We can start with a scenario that's so simple it's obviously not going to work. What's the value in a test like that? It gets us started. Let's get started with this:

def no_training_data_supplied_test(): the_perceptron = Perceptron() result = the_perceptron.predict() nt.assert_none(result, 'Should have no result with no training data.')

Getting this test to pass should be a pretty straightforward:

class Perceptron: def predict(self): return None

Doesn't get much simpler than that. Next we can try training the simplest possible case in the perceptron.

Keep in mind, we aren't doing test-driven math. It is perfectly okay to lean on the knowledge you have of the algorithms to inform the design of your code and the way you choose to get your tests to pass. In this case, you may like to take a step back and re-evaluate the math behind a simple perceptron and how to break it down into its simplest components.

One way to do this is to manually perform the math using a spreadsheet so that you can step through the calculations row by row. Here is an example of a pretty simple starting point:

Each row of the spreadsheet acts as a step through the training process for the perceptron. For the Weights 1 and Weights 2 columns, we can just make up some small random number. Weight 1 update and Weight 2 update are the values of the respective weights taking into account the training data. Mathematically stated, that means this (as defined in , ):

This can be translated to normal speak as the updated weight is equal to that weight's current value plus the product of the training rate, the current weight, and the difference of the training value and the current pre-trained prediction value. Translated to Excel parlance we have this:

The weights in each subsequent row are updated based on the updated weights in the row immediately above. In this way, each row is incrementally tuning the weights of the perceptron.

Predicting the values (as done in the last column) is defined mathematically as:

Here, the output of the condition is 1 for true and 0 for false. In case a spreadsheet makes more sense to you, this is how that translates:

We could do several iterations as well by just repeating the rows. We will touch on that a bit later when we tackle something more complex. This scenario only requires one pass through the training data however. Really it only requires the first input in the training data since the weights don't seem to change in any of the rows after they're updated. We can use this fact to simplify what we test.

Let's write a test to capture this scenario:

def train_an_OR_function_test(): the_perceptron = Perceptron([1,1],1) the_perceptron = Perceptron([1,0],1) the_perceptron = Perceptron([0,1],1) the_perceptron = Perceptron([0,0],1) nt.assert_equal(the_perceptron.predict([1,1]), 1) nt.assert_equal(the_perceptron.predict([1,0]), 1) nt.assert_equal(the_perceptron.predict([0,1]), 1) nt.assert_equal(the_perceptron.predict([0,0]), 0)

To solve this, your first pass may be to take what you learned from the Excel file and just apply it directly. Something like this:

class Perceptron: def __init__(self): self._weight_1 = 0.20 self._weight_2 = 0.20 def train(self, inputs, label): input = inputs[0] self._weight_1 = self._weight_1 + .25 * (input[0]- label[0]) * self.predict(input) self._weight_2 = self._weight_2 + .25 * (input[1]- label[0]) * self.predict(input) def predict(self, input): if len(input) == 0: return None return 0 < self._weight_1 * input[0] + self._weight_2 * input[1]

You might realize… while this makes the tests pass, it's a fair bit of code. Maybe it could be simplified. In actuality, you can delete most of it and have the tests still pass, as shown:

class Perceptron: def __init__(self): self._weight_1 = 0.20 self._weight_2 = 0.20 def train(self, inputs, label): pass def predict(self, input): if len(input) == 0: return None return 0 < self._weight_1 * input[0] + self._weight_2 * input[1]

This passes as well. It's much simpler and really is just an implementation of the prediction computation. For our scenario, the training process doesn't even matter. The training process is still included in the test though, because the fact that it doesn't make our tests pass is an implementation concern. As a user of the class, I would expect that the Perceptron would need training input. So let's leave that there for now even though the method feels superfluous.

Next, let's code a new scenario. Again, we will be choosing the next case based on our understanding of the algorithm. If you have a spreadsheet, explore simple scenarios that require you to tack on new complexity. Let's try making a perceptron that can signal when one or both of the outputs are positive. This is what a spreadsheet-based example might look like:

The spreadsheet is set up using the same formulas as last time. The first row handles setting initial values and each subsequent row refers to the previous row to get updated weights. Columns A-D are all inputs with each block of rows within a given iteration repeating in the next iteration. It's easiest to think of it repeating as a sort of loop.

Since we have an example scenario, let's use it to write our next test. This example is a little weird since it includes the use of a dummy variable and it requires several iterations to converge to the right solution. The use of a dummy variable allows our classifier to not be forced to be centered around 0. Essentially, it allows us to move the dividing line between true and false.

Since we already support two inputs, we're going to just include the dummy variable in our test for now so we can focus on getting the iteration working. Once that's stable, then we'll work at refactoring to include the dummy variable by default. The reason for separating these two needs is that if the updates break our other test, this will allow us to know that the iteration is what broke it instead of having two possible causes.

Here is a test case based upon the spreadsheet scenario:

def detect_values_greater_than_five_test(): the_perceptron = Perceptron() the_perceptron.train([ [ 5, -1], [ 2, -1], [ 0, -1], [-2, -1], ], [1,0,0,0]) nt.assert_equal(the_perceptron.predict([ 8, -1]), 1) nt.assert_equal(the_perceptron.predict([ 5, -1]), 1) nt.assert_equal(the_perceptron.predict([ 2, -1]), 0) nt.assert_equal(the_perceptron.predict([ 0, -1]), 0) nt.assert_equal(the_perceptron.predict([-2, -1]), 0)

We can get the test to pass with the following code:

class Perceptron: def __init__(self): self._weight_1 = 0.431 self._weight_2 = 0.02 def train(self, inputs, labels): for _ in range(4): for input, label in zip(inputs, labels): label_delta = (label - self.predict(input)) self._weight_1 = self._weight_1 + .1 * input[0] * label_delta self._weight_2 = self._weight_2 + .1 * input[1] * label_delta def predict(self, input): if len(input) == 0: return None return int(0 < self._weight_1 * input[0] + self._weight_2 * input[1])

If you notice, in the...



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.