Introduction
By definition, a Distributed system is not a Centralized system where state is stored on a single computer. Centralized Systems are simpler, easy to understand and can be faster for a single user.
Distributed Systems state is divided over multiple computers. It is more robust, scalable but also more complex.
Examples of Distributed Systems :
- Domain Name Systems (DNS)
- Netflix
- Email servers (SMTP)
- …
Distributed Systems Advantages
- Scalability
The main advantage is definitively Scalability ! Indeed, it’s only a matter of adding more machines, it’s also cheaper than super computers. Finally, more machines means more parallelism, so better performance.
2. Sharing
The same resource is shared between multiple of users.
3. Communication
Communication between (geographically isolated) machines and users.
4. Reliability
The service remains active even if multiple machines go down !
Distributed Systems Challenges
Despite all advantages described previously, here are the challenges you will have to deal with Distributed Systems :
- Concurrency
Concurrent execution requires some form of coordination.
2. Fault-tolerance
Any component can fail at any instant due to a software or a hardware bug.
3. Security
One machine can compromise the entire system.
4. Coordination
No global time so non-trivial to coordinate.
5. Trouble Shooting
Harder to trouble shoot because hard to reason about the system.
Distributed Systems Categories
Distributed Systems can be split between 6 categories :
- Data stores (aka Distributed Databases)
Most distributed databases are NoSQL non-relational databases. They provide incredible performance and scalability at the cost of consistency or availability. (Cassandra, Riak, Voldemort)
2. Computing
The goal is to split enormous task (i.e 100 billion records), into many smaller tasks. So when you have a bigger task, simply include more nodes in the calculation. (Kafka, Apache Spark, Apache Storm)
3. File systems
Distributed file systems can be thought of as distributed data stores. They’re the same thing as a concept — storing and accessing a large amount of data across a cluster of machines all appearing as one. They typically go hand in hand with Distributed Computing. (Hadoop HDFS, Interplanetary FileSystem IPFS)
4. Messaging systems
Messaging systems provide a central place for storage and propagation of messages/events inside your overall system. They allow you to decouple your application logic from directly talking with your other systems. (RabbitMQ, Kafka, Apache ActiveMQ, Amazon SQS)
5. Ledgers
A distributed ledger can be thought of as an immutable, append-only database that is replicated, synchronized and shared across all nodes in the distributed network. (BlockChain, Bitcoin)
6. Applications
A system is distributed only if the nodes communicate with each other to coordinate their actions.Therefore something like an application running its back-end code on a peer-to-peer network can better be classified as a distributed application. (BitTorrent)
Source : https://medium.freecodecamp.org/a-thorough-introduction-to-distributed-systems-3b91562c9b3c