Jordan network

What Is a Jordan Network?

A Jordan network is a specific type of recurrent neural network (RNN) used in machine learning for processing sequential data. Unlike traditional neural network architectures, Jordan networks possess internal memory, allowing them to account for past inputs and outputs when making current predictions. This capability makes them particularly suitable for tasks involving time series data, where the order of information is crucial.

The defining characteristic of a Jordan network lies in its architecture, which includes a special "context layer." This context layer stores a copy of the network's previous output, feeding it back as an additional input to the hidden layer at the next time step. This feedback mechanism enables the network to maintain a form of short-term memory, influencing its processing of subsequent data points.

History and Origin

The concept of the Jordan network was introduced by Michael I. Jordan in his seminal 1986 technical report, "Serial order: A parallel distributed processing approach." At a time when much of neural network research focused on static pattern recognition, Jordan's work, along with others, was instrumental in developing architectures capable of handling sequences and temporal dependencies. These early models, often referred to as "simple recurrent networks," were foundational in exploring how neural networks could learn and represent information over time, contributing significantly to the field of artificial intelligence.

Key Takeaways

Jordan networks are a class of recurrent neural networks designed to process sequential data.
They feature a unique context layer that feeds the network's previous output back into its hidden layer, providing a form of memory.
This architecture allows Jordan networks to recognize patterns and dependencies over time, making them suitable for tasks like financial forecasting.
Despite their historical importance, more advanced RNN architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) have largely superseded Jordan networks for complex, long-range temporal dependencies.
Jordan networks were among the early attempts at developing neural networks capable of supervised learning on sequences.

Formula and Calculation

The operation of a Jordan network can be described by a set of equations that define how information flows through its layers at each time step (t). A key component is the context layer, which holds the output from the previous time step.

Let:

(x_t) be the input vector at time (t).
(h_t) be the activation of the hidden layer at time (t).
(y_t) be the output of the network at time (t).
(c_t) be the activation of the context layer at time (t).

The context layer's activation at time (t) is a copy of the network's output from the previous time step (t-1):
$c_t = y_{t-1}$

The hidden layer's activation is calculated based on the current input layer (x_t), the context layer (c_t), and the bias (b_h), passed through an activation function (\sigma_h):
$h_t = \sigma_h(W_{xh}x_t + W_{ch}c_t + b_h)$
Where (W_{xh}) are the weights connecting the input to the hidden layer, and (W_{ch}) are the weights connecting the context layer to the hidden layer.

Finally, the output layer produces (y_t) based on the hidden layer's activation (h_t) and a bias (b_y), using an activation function (\sigma_y):
$y_t = \sigma_y(W_{hy}h_t + b_y)$
Where (W_{hy}) are the weights connecting the hidden layer to the output layer.

During training, these weights ( (W_{xh}), (W_{ch}), (W_{hy})) and biases ((b_h), (b_y)) are adjusted typically using variations of the backpropagation algorithm, adapted for recurrent structures, to minimize the error between the network's predictions and the actual target values.

Interpreting the Jordan Network

Interpreting a Jordan network involves understanding how its internal state, primarily held within the context layer, influences its predictions over time. Because the context layer directly reflects the network's previous output, a Jordan network's current output is implicitly a function of its entire past sequence of outputs. This makes it particularly adept at tasks where previous outcomes directly affect future ones, such as predicting the next element in a sequence based on the last observed values.

For instance, in predictive modeling, if a Jordan network is trained to forecast stock prices, its current prediction incorporates the previously predicted price, allowing it to generate a sequence of forecasts that account for historical trends. The strength of the feedback connection from the output to the context layer, and subsequently to the hidden layer, determines how heavily past outputs influence current processing. A higher weight on these recurrent connections implies a stronger reliance on the network's "memory" of past predictions.

Hypothetical Example

Imagine a Jordan network is used to predict the next closing price of a hypothetical stock, "DiversiStock," based on its current price and the previous day's predicted closing price.

Scenario:
The network needs to predict the closing price for Day 3, given the actual closing price of Day 1 and Day 2, and the network's own prediction for Day 2.

Inputs:

Day 1 Actual Closing Price: $100.00
Day 2 Actual Closing Price: $102.50

Network Training (Simplified):

Day 1 (Initial State):
- Input ((x_1)): $100.00
- Context Layer ((c_1)): Initialized to zero (or a small random value), as there's no previous output.
- The network processes (x_1) and (c_1) through its hidden layer to produce an output (y_1). Let's say (y_1) (predicted Day 1 price) is $100.10.
Day 2:
- Input ((x_2)): $102.50
- Context Layer ((c_2)): Stores (y_1) (the network's predicted Day 1 price, $100.10).
- The network processes (x_2) and (c_2) to produce (y_2). Let's say (y_2) (predicted Day 2 price) is $102.30.
- During training, the network adjusts its internal weights to minimize the difference between (y_2) and the actual Day 2 closing price ($102.50).
Day 3 (Prediction):
- Input ((x_3)): (Let's assume the current price available at prediction time, e.g., the opening price for Day 3, or another relevant current feature). Let's use $102.75 as a proxy for current information for Day 3.
- Context Layer ((c_3)): Stores (y_2) (the network's predicted Day 2 price, $102.30).
- The network processes (x_3) and (c_3) to produce (y_3), the predicted closing price for Day 3.

By feeding its own previous output back, the Jordan network implicitly "remembers" its prior prediction, influencing its current forecast. This continuous feedback loop helps it capture dependencies within the sequence of predicted values, making it a viable tool for sequential data analysis.

Practical Applications

Jordan networks, as early forms of recurrent neural networks, have been applied in various fields requiring sequential data processing. While more complex architectures now dominate, understanding their applications provides insight into the broader utility of recurrent models.

In finance, Jordan networks, and RNNs in general, are particularly relevant for analyzing and predicting financial time series. Examples include:

Stock Price Prediction: Forecasting future stock movements based on historical price data.⁵,⁴
Algorithmic trading Strategies: Identifying trading signals by recognizing patterns in sequential market data.
Risk management: Assessing and predicting financial risks that evolve over time, such as credit risk or market volatility.
Sentiment Analysis: Processing sequences of text data (e.g., news articles, social media feeds) to gauge market sentiment, where the order of words and sentences is crucial.

Beyond finance, Jordan networks have found use in:

Speech Recognition: Processing sequences of audio signals to interpret spoken language.
Natural Language Processing: Tasks like sequence generation or understanding context in sentences.
Process Control: Modeling and predicting the behavior of dynamic systems where the current state depends on past actions and observations.

Their implementation is often facilitated by machine learning libraries, where they can be configured and trained using specific functions.³

Limitations and Criticisms

Despite their pioneering role, Jordan networks, like other simple recurrent neural networks, face several significant limitations:

Vanishing and Exploding Gradients: This is a major challenge for traditional RNNs, including Jordan networks. During the backpropagation through time training process, gradients (signals used to update network weights) can either become extremely small (vanishing) or extremely large (exploding) as they propagate through many time steps. Vanishing gradients hinder the network's ability to learn long-term dependencies, effectively causing it to "forget" information from distant past inputs. Conversely, exploding gradients can lead to unstable training and large weight updates, causing the network to diverge.²,¹
Limited Long-Term Memory: Due to the vanishing gradient problem, Jordan networks struggle to capture and utilize information from inputs or outputs that occurred many time steps ago. Their "memory" tends to be short-term, primarily influenced by recent data points.
Computational Intensity: Training Jordan networks on very long sequences can be computationally expensive and time-consuming compared to feedforward networks, as the recurrent nature requires processing steps sequentially through time.
Simplicity of Context: The context layer in a Jordan network simply holds the previous output. This simple feedback mechanism may not be sophisticated enough to capture complex temporal relationships present in highly intricate sequential data, leading to suboptimal performance compared to more advanced recurrent architectures.

These limitations ultimately led to the development of more complex and robust recurrent architectures, such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU), which are designed to mitigate the vanishing gradient problem and better capture long-range dependencies.

Jordan Network vs. Elman Network

The Jordan network and the Elman network are both foundational types of simple recurrent neural networks (SRNs), introduced around the same time and designed to handle sequential data by incorporating a memory mechanism. The primary distinction between them lies in the source of the feedback to their respective "context" or "memory" units.

Feature	Jordan Network	Elman Network
Context Layer Source	Receives feedback from the output layer.	Receives feedback from the hidden layer.
Information Stored	Primarily stores the network's past outputs.	Stores copies of the past hidden states.
Feedback Impact	Direct feedback from what the network predicted.	Feedback from the internal representations formed in the hidden layer.
Common Use Case	Often used in tasks where the previous output directly influences the next state.	Favored when internal states are more critical to capturing sequential patterns.

In an Elman network, the context layer holds a copy of the activations from the hidden layer at the previous time step. This copy is then fed back into the hidden layer along with the current input. In contrast, the Jordan network's context layer captures the network's actual output from the previous time step, providing the hidden layer with information about the network's past decisions or predictions. This subtle difference in feedback source impacts how each network learns and represents sequential patterns, often making Elman networks more focused on learning the internal dynamics of sequences, while Jordan networks emphasize the impact of past predictions on future ones.

FAQs

Q1: What is the main purpose of a Jordan network?

A Jordan network's main purpose is to process sequential data, where the order of information is important. It achieves this by using a feedback loop that allows it to retain a "memory" of past outputs, influencing its current predictions. This makes it suitable for tasks like pattern recognition in time series.

Q2: How does a Jordan network "remember" past information?

A Jordan network remembers past information through its dedicated context layer. This layer takes a copy of the network's output from the previous time step and feeds it back as an additional input to the hidden layer for the current time step. This mechanism effectively incorporates a trace of past predictions into ongoing processing.

Q3: Are Jordan networks still used today?

While Jordan networks were historically significant in the development of recurrent neural networks, they are generally not the architecture of choice for complex modern applications. More advanced RNN variants, such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU), have largely superseded them due to their ability to handle long-term dependencies and mitigate issues like the vanishing gradient problem more effectively. However, they remain important for understanding the foundational principles of recurrent architectures.