How Artificial Neural Network Mimics Biological Neural Network

Min Jean Cho

Human brain is comprised of more than billions of neurons, which form wide and deep neural networks. Thus, it could be said that the neuron is the basic unit of our learning/memory process. In this article, I will present how we can model the neuron and neural network. First, take a look at the anatomy of biological neuron shown in following figure.

image source: https://en.wikipedia.org/wiki/Neuron

Dendrites receive signals (in the form of neurotransmitters) from other neurons. When the amount of signals exceeds a threshold, the electrochemical signal (action potential) is passed to axon terminals. And axon terminals transmit the signals to other neurons through synaptic connection.

image source: https://en.wikipedia.org/wiki/Neuron

Brain’s learning/memory process involves strengthening or weakening of synaptic connections between neurons, which is called synaptic plasticity. The synaptic plasticity mainly results from the alteration of the number of neurotransmitter receptors on the dendrite of the succeeding neuron; strengthened synaptic connections have more receptors and weakened synaptic connections have less receptors. Now, let’s look at the details of the biology of neuron.

An introduction to the biology of neuron

The cell membrane, phospholipid bilayer, partitions off cytoplasm from its surrounding environment. The cell membrane is a semi-permeable barrier that can selectively transport molecules and ions, and transmembrane proteins facilitate the transport. For example, sodim-potassium pump fueled by the ATP hydrolysis is the transmembrane protein that transports three sodium ions (Na+) in cytoplasms to environment while it transport two potassium ions (K+) from environment to cytoplasms. In most cells, sodim-potassium pump maintains a higher concentration of sodium ions extracellularly, which results in relatively more negative charges inside the cell and less negative charges outside the cell. Such ionic (charge) gradient in turn generates the membrane potential (or membrane voltage as a pressure needed to equalize the different ionic concentraions).

In addition to the sodium-potasium pump, other types of transport proteins are involved in the alteration of membrane potential. These transport proteins, called gated ion channels, facilitate the diffusion of ions when their gates are open. With the gate opened, sodium ions are transported to cytoplasm via sodium channels and potassium ions are transported to environment via potassium channels.

In the resting state of neuron, a much larger number of potassium channels are opened than sodium channels, which results in high negative charge (ca. -70 mV) in the cytoplasmic side of membrane and this membrane potential is called resting potential. The reverse can occur as well. When the neuron is activated (see below for the activation mechanism), most potassium channels are closed and most sodium channels are opened, resulting in high positive charge (ca. +40 mV) in the cytoplasmic side of membrane. The membrane potential generated by the activation of neuron is called action potential.

Neurotransmitters emitted from the axon terminal are received by receptors on the membrane of dendrites. These receptors are ligand-gated ion channels (neurotransmitters act as the ligands). The neurotransmitters can be classified according to the ways of changing the membrane potential. Some neurotransmitters bind to the ligand-gated ion channels that can transport both sodium ions and potassium ions, which results in depolarization (the decrease of negative charges inside the cell). The membrane potential formed by depolarization is called excitatory postsynaptic potential (EPSP). Other types of neurotransmitters bind to ion channels that can transport potassium ions to environment, which results in hyperpolarization (the increase of negative charges inside the cell). The membrane potential formed by hyperpolarization is called inhibitory postsynaptic potential (IPSP). Various neurotransmitters are collected by dendrites, and the signals in the form of ESPS and IPSP are summed temporally as well as spatially at the region called axon hillock.

As shown above, if the extent of depolarization exceeds a threshold level, action potential is triggered at the axon hillock (this process is called integrate-and-fire). Depolarization opens voltage-dependent sodium ion channels that transport sodium ions to cytoplasm. Because they transport sodium ions in a voltage-dependent manner, the decreased negative charge accelerates the transport more and more, and almost all the voltage-dependent sodium ion channels are opened within a short time period. The result of such positive feedback causes the action potential.

It should be noted that the action potential is a pulse (or spike). Shortly after firing the action potential, voltage-dependent sodium ion channels are closed and voltage-dependent potassium ion channels are opened, which restores the membrane to resting state. And this restoring period is followed by a short refractory period. During the refractory period, the voltage-dependent sodium ion channels are closed (refractory period) and action potential cannot be generated by depolarization even if dendrites continuously receive EPSP-causing neurotransmitters. The action potential generated at the axon hillock in turn triggers the depolarization of neighboring membrane region, which enables the transmission of the action potential toward axon terminal in a manner similar to domino.

While the voltage of action potential is almost constant, the strength of the input signal is represented by the frequency of the action potential.

On the arrival of action potential, voltage-dependent calcium ion channels are opened and transport calcium ions to cytoplasm. Then, the calcium ions bind to the synaptic vesicle (intracellular membrane structures located near presynaptic membrane) containing neurotransmitters, allowing the vesicles to fuse with presynaptic membrane, which in turn results in the secretion of neurotransmitters.

Simplified representation of biological neural network

Let’s now create a simplified representation of neural network as follows.

The synaptic plasticity can be viewed as following diagram. Strengthened synaptic connections are shown in red and weakened synaptic connections are shown in blue.

Notice that strengthened synaptic connections give more weights to incoming signals and weakened synaptic connections give less weights to incoming signals. Remember that brain learning/memory process involves strengthening or weakening of synaptic connections between neurons? (Spoiler Alert: this is exactly the role of learned weights in artificial neural network)

In the following diagram, top and bottom panels show inactivated and activated neurons, respectively.

Note that in the above diagram the amount of neurotransmitters passed to the next neurons is not necessarily the same as the amount of neurotransmitters in the preceding neurons (i.e., it is not linear); however, neurons transmit signals to the next neurons equally (shown as six neurotransmitters emitted from each axon terminal). We will revisit this topic later in this article.

In the above diagram, \(x_i\) represents the amount of neurotransmitters received by dendrite \(i\), \(w_i\) represents the extent of strengthening the synaptic connection, and \(y\) represents the amount (the release state) of neurotransmitter to be emitted from axon terminal when the neuron is activated. Since the amount of output signals (\(y\)) passed on is the same for all axon terminals, we can further simply our model of neuron as follows.

Biological neurons and neural network should be much more complex than this simplification and there could be a lot more mechanisms we have not yet fully understood. However, the above model could be a good starting point for computer scientists as well as neuro- and cognitive scientists. In fact, the above model is very useful to describe the fully connected layers (multi-layer perceptron) of artificial neural network.

Mathematical model of neural network

Now, we will have a mathematical model of an artificial neuron. Let’s look at an example. Suppose that the input signals \(x_i\) are from optical (sensory) neurons. For example, if \(x_i\) are features that characterizes lions, \(x_1\), \(x_2\), and \(x_3\) can be numerical representations of size, color, and hair texture, respectively. Then, output signal will be transmitted to leg muscle such that when we see a lion, we will run away. To decide whether an object is lion or not, we first need to collect the visual information when we see the object. If we have learned that color information is more important than other features, our perceptive neuron gives more weight to the dendrite for \(x_2\). In other words, the perceptive neuron collects signals from sensory neurons using its learned (weighted) dendrites as follows:

\[\Sigma \triangleq \sum_{i=1}^3 w_ix_i = \begin{bmatrix} w_1 & w_2 & w_3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \mathbf{w}\mathbf{x}\]

I mentioned earlier that the relationship between the total amount of neurotransmitter received and the release rate of neurotransmitter is not linear. There are some points we should note about the amount of neurotransmitters released from axon terminal. First, there is a threshold for neurotransmitters to generate action potential. Second, the strength of input signals is in the form of the frequency of pulses (e.g., bit per millisecond). Third, the relationship between the rate of neurotransmitter release and the concentration of intracellular calcium ion is sigmoidal (that is, the release rate has its maximum). Forth, released neurotransmitters rapidly diffuse away and are subjected to metabolic degradation. Thus, the amount of neurotransmitters released by activated neuron (\(y\)) is not proportional to the total amount of neurotransmitters received (\(\mathbf{\Sigma} = \mathbf{wx}\)). Even at the maximum rate of the release, the concentration of neurotransmitters in the synaptic cleft has its maximum at every unit time step due to the diffusion and degradation. The best model for the amount of neurotransmitters release could be sigmoidal functions such as logistic function,

\[\sigma(x) = \frac{1}{1 + e^{-x}}\]

or scaled hyperbolic tangent (tanh) function (tanh is a rescaled logistic function),

\[y = 0.5[1+\text{tanh}(\Sigma)], \text{tanh}(\Sigma) = \frac{e^{\Sigma}-e^{-\Sigma}}{e^{\Sigma}+e^{-\Sigma}} = 2\sigma(2\Sigma)-1\]

both of which have a maximum or saturation when \(\Sigma\) approaches positive infinity. Let’s take a look at a sigmoid activation function (\(\sigma\)) used in artificial neural network.

The output ranges from 0 to 1, which is the range of probability. If we think of the normalized release rate of neurotransmitters as a probability, \(y\) can be viewed as a posterior probability in Bayesian perspective, then the above activation function can be written as \(y=\sigma(\Sigma+b)\), where \(\Sigma = \textbf{wx}\) corresponds to sum of log likelihood ratios (\(w_ix_i\)) and \(b\) corresponds to prior odds (I show this here). Our previous equation still holds because we can incorporate the bias term into weight vector as follows:

\[\Sigma \triangleq \left[\sum_{i=1}^3 w_ix_i\right]+b = \begin{bmatrix} w_0 & w_1 & w_2 & w_3 \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix} = \mathbf{w}\mathbf{x}, w_0=b \text{ and } x_0=1\]

If the weights are viewed as log likelihood ratio, positive weights are for neurotransmitters causing EPSP and negative weights are for neurotransmitters causing IPSP. However, considering that the log likelihood ratio is directly calculated by multiplying \(x_i\) (the data) by \(w_i\) and the bias term could also related the threshold for action potential, the interpretation of \(\mathbf{wx}+b\) in neural network might be more complex than that used in conventional Bayesian inference.

Okay, now let’s be more realistic. Our brain has immense layers of neurons. But for simplicity, let’s assume that there are only three layers and each neuron has three dendrites and three axon terminals. Let’s further assume that neurons in each layer are fully connected to neurons of the next layer.

We can further simplify our schematic representation by not drawing the dendrites and assuming that each neuron in the first layer has one dendrite only.

Let’s go back to our example. The first layer is composed of optical neurons. So we can think of the first layer as the eyes, and the sencond and the third layers as the neural circuits of brain. The final brain signals will be transmitted to leg and hand muscles. For example, \(x_1\), \(x_2\), and \(x_3\) correspond to optical signals and \(y_1\), \(y_2\), and \(y_3\) correspond to signals for muscles. Optical neurons are sensory neurons thus receive raw signals; hence the first layer is called the input layer. The last layer transmits signals to make actions; hence the third layer is called the output layer. And finally, layers in between the input and output layers, the second layer in our example, is called the hidden layer.

Continuing with our example, let’s assume that each sensory (optical) neuron in the input layer emits signals as they are received (i.e., no activation function in the sensory neurons) equally to the next neurons, while perceptive neurons in the hidden and output layers have sigmoid activation functions.

We now have our representation of artificial neural network as shown above. As mentioned earlier, it could be an over-simplication of biological neural network but it will help understand artificial neural network. In the next article, we’ll go through the mathematical details of 1) representing neural network with matrices and vectors; 2) learning/optimization weights by gradient descent.