Patterns in Software Architecture: The Pipes and Filters Pattern

Patterns are an important abstraction in modern software development and software architecture. They offer well-defined terminology, clean documentation, and learning from the best. The Pipes and Filters architecture pattern is similar to the Layers pattern and describes the structure for systems that process data streams.



The idea behind the layering pattern is to structure the system in layers so that the higher layers rely on the services of the lower layers. The Pipes and Filters pattern naturally extends the layered pattern by using the layers as filters and the data flow as pipes.

Purpose

  • A system that processes data in multiple steps.
  • Each step processes its data independently of the others.

implementation

  • Division of the task into several processing steps.
  • Each processing step is the input for the next processing step.
  • The processing step is called a filter; the data channel between the filters is called a pipe.
  • The data comes from the data source and ends up in the data sink.

Structure





Filter

  • receives input data,
  • performs its operation on the input data and
  • produces output data.

Pipe

  • transmits data,
  • buffers data in a queue and
  • Syncs neighbors.

Data Source

  • produces input for the processing pipeline.

Data Sink

The most interesting part of the Pipes and Filters pattern is the data flow.

There are several ways to control the flow of data.

Push-Prinzip

  • The filter is started by passing the data from the previous filter.
  • The (n-1)th filter sends (write) data to the nth filter.
  • The data source starts the data flow.

Pull-Prinzip

  • The filter is started by requesting the data from the previous filter.
  • The nth filter requests data from the (n-1)th filter.
  • The data sink starts the data flow.

Mixed push/pull principle

  • The nth filter requests data from the (n-1)th filter and explicitly passes it to the (n+1)th filter.
  • The nth filter is the only active filter in the processing chain.
  • The nth filter starts the data flow.

Active filters as independent processes

  • Each filter is an independent process that reads data from the previous queue or writes data to the following queue.
  • The nth filter can only read data after the (n-1)th filter has written data to the connecting queue.
  • The nth filter can only write its data after the (n+1)th filter has read the connection queue.
  • This structure is called producer/consumer.
  • Each filter can start the data flow.

The most famous example of the Pipes and Filters pattern is the UNIX Command Shell.

Unix Command Shell

  • Find the five Python files in my python3.6 installation that have the most lines:



Here are the pipeline steps:

  • Find all files that come with py the end: find -name "*.py".
  • Get the number of lines from each file: xargs wc -l.
  • Sort numerically: sort -g .
  • Remove the last two lines with irrelevant statistical information: head -n -2.
  • Find the last five lines: tail -5.

Finally, here is the classic command line processing with pipes by Douglas Mcllroy.

tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q

If you want to know how this pipeline works, you can find the whole story behind it in the article “More shell, less egg”.

C++ supports the Pipes and Filters pattern thanks to the Ranges library in C++20.

Ranges

The following program firstTenPrimes.cpp Displays the first ten prime numbers starting with 1000.

// firstTenPrimes.cpp

#include <iostream>
#include <ranges>
#include <vector>

bool isPrime(int i) {
    for (int j = 2; j * j <= i; ++j){
        if (i % j == 0) return false;
    }
    return true;
}

int main() {

    std::cout << '\n';
    
    auto odd = [](int i){ return i % 2 == 1; };

    auto vec = std::views::iota(1'000) 
      | std::views::filter(odd)           // (1)
      | std::views::filter(isPrime)       // (2)
      | std::views::take(10)              // (3)
      | std::ranges::to<std::vector>();   // (4)

    for (auto v: vec) std::cout << v << " ";

}

The data source (std::views::iota(1'000)) produces the natural number, starting with 1000. First the odd numbers are filtered out (1) and then the prime numbers (2). This pipeline stops after ten values ​​(3) and pushes the elements into the std::vector (4). The practical function std::ranges::to creates a new range (4). This function is new in C++23. Therefore I can only run the code with the latest windows compiler in compiler explorer.



I use the term universal interface in my following comparison. This means that all filters speak the same language, such as xml or json.

Advantages

  • If a filter pulls or pushes the data directly from its neighbor, no intermediate buffering of the data is necessary.
  • An nth filter implements the layered pattern and can therefore be easily replaced.
  • Filters that implement the universal interface can be reordered.
  • Each filter can work independently of the other and does not have to wait for the neighboring filter to finish. This enables an optimal division of labor between the filters.
  • Filters can run in a distributed architecture. The pipes connect the remote units together. The pipes can also split or synchronize the flow of data. Pipes-and-Filters is commonly used in distributed or concurrent architectures and offers great opportunities for performance and scalability.

Disadvantages

  • Processing data in parallel can be inefficient due to communication, serialization, and synchronization overhead.
  • A filter like sorting needs the entire data.
  • If the processing power of the filters is not homogeneous, large memories are needed between them.
  • To support the universal interface, the data between the filters must be formatted.
  • Probably the most complicated part of this pattern is the error handling. If the Pipes and Filters architecture crashes during data processing, the data may have been processed only partially or not at all. Now there are the following options:
    • Start the process again if the original data is still there.
    • Only use the fully processed data.
    • After inserting markers into the , the process can be restarted from the markers if the system crashed

The broker structures distributed software systems that interact with remote service calls. He is responsible for the coordination of the communication, its results and exceptions. In my next article, I will delve deeper into the Broker architectural pattern.

For last-minute decision-makers. All details about C++20:

  • C++20: 04/18/2023 – 04/20/2023 (presence training at Herrenberg in the Hotel Aramis)


()

To home page

Related Posts

Hot News

Trending

usefull links

robis robis robis