A Developer

Your-First Machine Learning with Tensor Flow

Machine Learning (ML) is a branch of artificial intelligence and computer science that focuses on how to make machines imitate the way humans learn, and gradually improve the performance of that learning process. This process uses data and algorithms to learn. Like a baby learning to stand. The baby will try to stand up trying while learning cause and effect. If the way they stand is wrong they will fall and then will avoid that way in the future. 

In this tutorial we will learn how to teach a computer to diagnose diabetes based on a person’s medical records.

DATASET

The data set is used as an example of which patients are diabetic and which are not by looking at their medical records. The way machines learn using data is called supervised learning.

In this tutorial we use medical record data from Pima Indian diabetic patients. This dataset comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal of this dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.

The data set consists of several medical predictor variables and one target variable. The predictor variables consist of:

  1. Pregnancies. How many times has the patient been pregnant?
  2. Glucose: Plasma glucose concentration within 2 hours after the patient was tested for oral glucose tolerance.
  3. High Blood Pressure: Diastolic blood pressure (mm Hg)
  4. Skin Thickness: Triceps skinfold thickness (mm)
  5. Insulin: serum insulin within 2 hours (mu U/ml)
  6. BMI: Body mass index (weight in kg/(height in m)^2).
  7. DiabetesPedigreeFunction: Diabetes pedigree function
  8. Age: Age in years

While the last variable, the 9th variable, is the target class. If the target class value is one, then the patient has diabetes, if it is zero, then the patient does not have diabetes.ai penyakit diabetes, jika bernilai nol maka pasien tidak mempunyai penyakit diabetes.

Incidentally, all the predicator variables that will be input are numeric. Numeric data types are an ideal choice for input and output of artificial neural networks.

The dataset can be downloaded from one of the link here:

Here are the steps we will take in this tutorial:

  1. Open a new Notebook in Google Colaboratory
  2. Load data
  3. Define hard modelDefine hard model
  4. Hard model compilation
  5. Hard Fit Model
  6. Hard model evaluation
  7. Save the model to a file
  8. How to use models for prediction

Before doing this tutorial, I assumed that you already understand how to use Google Collaboratory.

Open a new Notebook in Google Colaboratory

Now the first step is to open Google Colaboratory (url: https://colab.research.google.com/?hl=id )

Then select NEW NOTEBOOK .

Save the notebook that opens in your browser in your Google Drive with the name “My First Machine Learning.ipynb” . This Google Colab uses Jupyter Notebook so the file extension is ipynb.

Code: to add a new code cell, the programming language used is “Python”. Each code cell can be executed separately from other code cells.
Text: to add a new text cell, usually used to write captions or comments.

Load Data

First we will load our data into Google Colaboratory. There are two ways to do this, namely by uploading to Session Storage and the second way is to retrieve the file from Google Drive. If we are going to retrieve from Google Drive, make sure the file is already stored in it. For this material, we will upload it to session storage. In the upper right corner, click the folder icon, then click the document icon with an arrow. Select the file from your computer.

Once we have made sure the file is uploaded, the next step is to determine the functions and classes we want to use. We will need the NumPy and Keras libraries.

from numpy import loadtxt
import keras
from keras import layers

Load data from CSV file to dataset variable. The data loaded into dataset is two-dimensional data with rows and columns. We will divide it into two variables. The predictor variable is the data in columns 1-8. The target variable is the data in column 9. In Numpy, we store the data as an array and the array index starts from zero. So the first column is index 0, the second column is index 1, the third column is index 2 and so on.

dataset = loadtxt('pima-indians-diabetes.data.csv', delimiter=',')
predikator = dataset[:,0:8]
target = dataset[:,8]

If we look at the second row dataset[:, 0:8]. The colon in the square brackets is the row and we will take all rows. While 0:8 is we will take the column from index zero to before index 8. It’s a bit strange but that’s how Numpy works.

Define the hard model

The neural network model that will be used is the Sequential model. This model is suitable for regular layer stacks where each layer has exactly one input tensor and one output tensor. The first layer with the input_shape argument and we set it with eight inputs (8,) to display eight input variables as vectors. In this tutorial we will use a neural network structure with three layers.

Fully connected layers are defined using the Dense class. You can specify the number of neurons or nodes in the layer as the first argument and the activation function using the activation argument.

model = keras.Sequential(
    [
        layers.Dense(12, activation="relu", name="layer1"),
        layers.Dense(8, activation="relu", name="layer2"),
        layers.Dense(1, activation="sigmoid", name="layer3"),
    ]
)

If we type model.summary, we can see the structure of our neural network.

Optimizer is a technique or algorithm used to reduce the loss (error) by adjusting various parameters and weights, thereby minimizing the loss function and providing better model accuracy faster. 

The loss function helps optimize model performance by measuring some penalty imposed on the model for its predictions.

Metrics are functions used to evaluate model performance. Metric functions are similar to loss functions, except that the results of the metric evaluation are not used when training the model.

Now we will configure our model with the Adam optimizer, Mean Squared Error loss function, and Accuracy metric function.

# compile the keras model
model.compile(optimizer='adam', loss=keras.losses.mean_squared_error, metrics=['accuracy'])

The next step is to train our neural network. The function used is fit(), which will train the model by dividing the data into “batches” of size batch_size, and repeatedly iterating over the entire data set for a certain number of periods (epochs) of time.

Batch size is the number of training samples fed into the neural network at once. Epoch is the number of times the entire training dataset is passed through the neural network. For example, if we have 1000 samples and a batch size of 100, then one epoch consists of 10 batches.

model.fit(predikator, target, epochs=100, batch_size=10)

This training process will take a long time so wait until the process ends at epoch 100.

Now we will save the model so that we can use it again at any time.

_, accuracy = model.evaluate(predikator, target)
print('Accuracy: %.2f' % (accuracy*100))

Sekarang kita akan menyimpan model agar dapat dipergunakan lagi kapan pun.

model_json = model.to_json()
with open(pathgdr + "model.json", "w") as json_file:
     json_file.write(model_json)

We also keep the weights that the machine has

model.save_weights(pathgdr + "model.weights.h5")

In this tutorial we save it in Google Drive with the path /content/drive/MyDrive/dataset) then we can see it in the explorer on the left.

To retrieve the model and then reuse it for new patient predictions, it can be done by loading the model and weights first. Here the model loaded from our file is saved into the loaded_model variable.

# load json and create model
json_file = open(pathgdr+'model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)

# load weights into new model
loaded_model.load_weights(pathgdr+"model.weights.h5")

Patient prediction is done by entering eight array values ​​according to the predicator attributes, namely Pregnancies, Glucose, High Blood Pressure, Skin Thickness, Insulin, BMI, and DiabetesPedigreeFunction.

diagnosis_patient = loaded_model.predict(np.array([[6, 148, 72, 35, 0, 33.6, 0.627, 50]]))
print("Predicted " + str(diagnosis_patient) )

if( diagnosis_patient > 0.5):
  print('Diabetes')
else:
  print("Not Diabetes")

The result of the prediction is an integer value from 0 to 1. Since the actual target value is only zero and one, we need to add a branching logic. If the predicted value is greater than 0.5 then it is considered one or has diabetes. If the predicted value is less than or equal to 0.5 then it is considered one or does not have diabetes.


Leave a comment

Your email address will not be published. Required fields are marked *