In the case of multithreading, the operation of shared resources needs to be locked to prevent data from being messed up. In a distributed system, this problem also exists. At this time, a distributed lock service is required.
Common distributed lock implementations are generally based on DB, Redis, and Zookeeper. Below, the author will analyze the design and implementation of these three types of distributed locks in order. Friends who want to see the summary of distributed locks directly can turn to the end of the document.
There are many ways to implement distributed locks, but in any case, distributed locks generally have the following characteristics:
Exclusiveness: at any time, only one client can acquire the lock;
Fault tolerance: Distributed lock service generally needs to satisfy AP, that is, as long as most of the distributed lock service cluster nodes survive, the client can perform lock and unlock operations;
Avoid deadlock: Distributed locks must be released, even if the client crashes or the network is unreachable before release.
In addition to the above characteristics, distributed locks should also be able to meet reentrant, high-performance, blocking lock characteristics (such as AQS, which can wake up from a blocked state in time), etc. The design and implementation of distributed locks).
1. DB lock
Create a new table in the database to control concurrency control. The table structure can be as follows:
The key_id is used as a distributed key for concurrency control, and memo can be used to record some operations (for example, memo can be used to support reentrant features, mark the current locked client and the number of locks). Setting key_id as a unique index ensures that only one lock (data insertion) can succeed for the same key_id. At this time, the pseudo code of lock and unlock is as follows:
Note that the lock operation in the pseudo code is a non-blocking lock, that is, tryLock. If you want to implement blocking (or blocking timeout) locking, you only need to execute the lock pseudo code repeatedly until the locking is successful.
DB-based distributed locks actually have a problem, that is, if the client is down after the lock is successfully locked or is not unlocked due to network reasons, then other clients cannot lock the key_id and cannot release it. In order to invalidate the lock, you need to add a timed task to the application layer to delete the expired and not unlocked records. For example, delete the pseudo code that was not unlocked 2 minutes ago as follows:
Because the TPS of a single-instance DB is generally several hundred, the upper limit of DB-based distributed performance is generally below 1k. Generally, the distributed lock meets the demand in scenarios where the amount of concurrency is not large, and there will be no performance problems.
However, as a distributed lock service, DB needs to consider a single point problem. For distributed systems, a single point is not allowed. Generally, this problem can be solved through synchronous replication of the database and the use of VIP to switch the Master.
The above DB distributed lock is implemented through insert. If the locked data already exists in the database, it is also possible to use select xxx where key_id = xxx for udpate.
2. Redis lock
Redis lock is to lock resources through the following commands:
set key_id key_value NX PX expireTime
Among them, the set nx command will only assign a value to the key when the key does not exist, px is used to set the key expiration time, and key_value is generally a random value to ensure the safety of the release of the lock (it will be judged whether it was previously set during release Random value, only release the lock). Since the resource has an expiration time, the lock will be automatically released after a certain period of time.
set nx guarantees that only one client can be set successfully when concurrent locking is set (Redis is single-threaded internally, and the data is stored in memory, that is to say, there will be no multi-thread synchronization problems when executing commands inside Redis), at this time lock/unlock The pseudo code is as follows:
A problem in distributed lock service
If a client that has acquired the lock fails to release the lock in time for some reason, and Redis releases the lock due to timeout, and another client acquires the lock, the situation is shown in the following figure:
So how to solve this problem?
One solution is to introduce a lock renewal mechanism, that is, after the lock is acquired and before the lock is released, the lock renewal will be performed regularly, for example, the lock renewal is performed at an interval of 1/3 of the lock timeout period.
There are many implementations of open source Redis distributed locks. The more famous ones are redisson and Baidu's dlock. Regarding distributed locks, the author also wrote a simple version of the distributed lock Redis-lock, mainly adding lock renewal and A mechanism that can lock multiple keys at the same time.
For high availability, it can generally be solved by clustering or master-slave. The advantage of Redis lock is excellent performance, but the disadvantage is that the data is in memory. Once the cache service goes down, the lock data is lost.
Like Redis's built-in replication function, data reliability can be guaranteed to a certain extent, but because replication is also done asynchronously, it is still possible that the master node writes the lock data but is not synchronized to the slave node, and the lock data is lost. .
3. Zookeeper distributed lock
ZooKeeper is a highly available distributed coordination service, created by Yahoo, and is an open source implementation of Google Chubby.
ZooKeeper provides a basic service: distributed lock service.
The three important features of Zookeeper are: zab protocol, node storage model and watcher mechanism. Data consistency is ensured through the zab protocol, and the Zookeeper cluster deployment ensures availability. The node is stored in memory to improve data operation performance. The watcher mechanism is used to implement a notification mechanism (for example, a successfully locked client can notify other clients when the lock is released) .
The Zookeeper node model supports the feature of temporary nodes, that is, the data written by the client is temporary data, and the temporary data will be deleted when the client is down, so there is no need to add a timeout release mechanism to the lock.
When multiple creation requests are concurrently issued for the same path, only one client can be created successfully. This feature is used to implement distributed locks. Note: If the client is not down and the Zookeeper service and client heartbeat fail due to network reasons, Zookeeper will also delete the temporary data. At this time, if the client is still operating the shared data, there is a certain risk.
The implementation of distributed locks based on Zookeeper is easier to use, with better efficiency and stability than implementations based on Redis and DB. Curator encapsulates the API operations of Zookeeper, and also encapsulates some advanced features, such as: Cache event monitoring, elections, distributed locks, distributed counters, distributed barriers, etc. Examples of using curator for distributed locking are as follows:
From the design and implementation of the three distributed locks introduced above, we can see that each implementation has its own characteristics, and there are different solutions to potential problems, which are summarized as follows:
Performance: Redis> Zookeeper> DB.
Avoid deadlocks: DB sets timed tasks at the application layer to delete expired and unreleased locks. Redis solves the problem by setting a timeout period, while Zookeeper solves it by temporary nodes.
Availability: DB can be solved by database synchronous replication and VIP switching master; Redis can be solved by cluster or master-slave; Zookeeper itself can be solved by zab protocol cluster deployment. Note that the replication of DB and Redis is generally asynchronous, that is to say, there may be data inconsistencies when the distributed lock fails at some time, and Zookeeper itself guarantees the data of the nodes in the cluster (at least n/2+1) through the zab protocol. consistency.
Lock wake-up: DB and Redis distributed locks generally do not support a wake-up mechanism (you can also check whether the lock is idle through the application layer's own polling, and wake up the internal lock thread when it is idle). Zookeeper can do it through its own watcher/notify mechanism.
Using distributed locks, security is incomparable with multi-threaded (in the same process) locking. Perhaps due to network reasons, the distributed lock service (because of timeout or thinking that the client is hung up) will delete the locked resource. If the client continues to operate the shared resources, there are hidden dangers at this time.
Therefore, for distributed locks, one is to maximize the availability of distributed lock services, and the other is to deploy the same intranet to minimize the probability of network problems.
From this point of view, it seems that the distributed lock service is not "perfect" (PS: the technology does not seem to be perfect: (), then how should developers choose distributed locks? It is best to combine their own business scenarios, To choose different distributed lock implementations, generally speaking, there are more distributed lock service applications based on Redis.
1. Talk about the design of distributed locks