redisdata structures, you may be familiar with the five basic data types:
Sorted Set. Then addition, there are three major derived data structure, we usually are little contact, namely:
Also, I think, these three data structures, can only be icing on the cake. Really in the project, I really haven't used it. Let's take a look at the definition and use of these three data structures
Speaking of this
bitmaps, it is actually
String, but it can operate on
Stringthe bits. And then, this bit operations, has its own command, and so the operation
rediscommand not much different! Can be understood like this
Bitmaps is an array in units of bits, each unit of the array can only store 0 and 1
Here is an example, for example, if we want to do a set operation,
h, then execute the following command
127.0.0.1:6379> set w h OK 127.0.0.1:6379> get w "h"
hthe ASCII is
Next, you can use the bit command
getbitcommand to fetch, fetch the content of each bit.
127.0.0.1:6379> getbit w 0 # getbit w 0 (integer) 0 127.0.0.1:6379> getbit w 1 # getbit w 1 (integer) 1 127.0.0.1:6379> getbit w 2 # getbit w 2 (integer) 1 127.0.0.1:6379> getbit w 3 # getbit w 3 (integer) 0
According to online rumors, this structure is used to count the number of active users in a certain period of time, and the structure used
setspace- saving than the traditional structure. However, this use has great limitations, as I will talk about later. Let me talk about the online statement. Suppose there are 30 users, of which 5 users
2018-10-04log in on this day. Suppose the userid of these 5 users is 2, 4, 8, 11, 12. Then, we assume that the key is
users:2018-10-04, and its
valuevalue is used to record user login information. So in order to record that the above 5 users have logged in, we
valueset the 2nd, 4th, 8th, 11th, and 12th bits of the value to 1, that is, execute the following command
127.0.0.1:6379> setbit users:2018-10-04 2 1 (integer) 0 127.0.0.1:6379> setbit users:2018-10-04 4 1 (integer) 0 127.0.0.1:6379> setbit users:2018-10-04 8 1 (integer) 0 127.0.0.1:6379> setbit users:2018-10-04 11 1 (integer) 0 127.0.0.1:6379> setbit users:2018-10-04 12 1 (integer) 0
At this time, for example, if you want to determine the user with userid=11, on 2018-10-04, if you have logged in, execute the following command
127.0.0.1:6379> getbit users:2018-10-04 11 (integer) 1
The result is 1, which means the user has logged in. If the returned result is 0, it means that the user has never logged in. If you want to count, 2018-10-04, the number of users logged in on this day, you can execute the following command
127.0.0.1:6379> bitcount users:2018-10-04 (integer) 5
The above command can be counted, 2018-10-04, 5 users logged in on this day. Ok, you won't be able to understand much if you find it here. Let me talk about it first, here
userid=2 4 8 11 12, it can be understood as an offset. For example, if the userid in the actual project is 1000002, then the offset is 2. Everyone can be flexible in the project. However, this approach has a limitation. In our actual project, if the userid is generated using uuid, then how do you generate the offset based on these userid? Could it be that you have to find a hash function to generate an offset? Even if the corresponding hash function is found, can you ensure that the hash collision will not occur, resulting in the same offset? So, everyone can understand.
HyperLogLogIt is not a data structure, but an algorithm that can use a very small memory space to complete the statistics of independent totals . In fact, everyone may be relatively unfamiliar with this algorithm. We
javahave a library called
stream-lib, which also implements
HyperLogLogalgorithms. I probably talk about the principle of the algorithm . I don't want to go to a long-formed math paper. Everyone looks boring. This
Hyperrefers to the super meaning. Its previous life is the LogLog algorithm. Here, I m just a little bit stunned, so that everyone can understand the essence. Suppose there is the following dialogue
Me: "Small song, suppose, I tossed 5 coins in a round, and after many rounds, I found that in these rounds, there were at most 2 consecutive negatives and 1 positive. Can you guess how many rounds I lost? "Small tune: "It shouldn't be a few rounds, at most seven or eight rounds." Me: "Fuck, so witty, how do you count?" Small tune: "It's very simple, the pros and cons are both 1/2, even Focusing on the second negative side and the first positive side. Isn t it 1/2 1/2 1/2!" I: "What if there are 4 negative sides and 1 positive side at most?" Xiao tune: "It should be many rounds, right? !" Me: "It's really witty!"
The above chat is between myself and my colleague, and they play each other every day! Any similarity is purely coincidental!
Well, the principle is over! It's just that his estimation algorithm is more complicated! It's not that simple! And with this estimate, the error is still relatively large! The pseudo code of the algorithm is given below.
max = 0 hashCode = hash( ) num = hashCode 0 if num > max: max = num 2 (max + 1)
It should be noted
hashCode = hash( )
It is to map your input element into a long binary string. After mapping it into a long binary string, it can be compared to the result of the coin toss I mentioned first. As for why the final result is used
(max+1), you can check the literature. After all, this article is talking about
redis, not about this algorithm. Moreover, this algorithm has undergone a series of evolutions. For example, the input parameter set is divided into m parts, and then
mthe results of each part are averaged
(avg), and finally the
(avg + 1)independent total is estimated by the power of 2 ! Those readers who are interested can make inquiries by themselves!
This structure can save memory to count various counts, such as the
IPnumber of registrations and the number of daily visits
IP. Of course, there are errors! The official number given by Redis is 0.81% error rate. The usage is also very simple as shown below
127.0.0.1:6379> pfadd ips:2018-10-04 "127.0.0.1" "127.0.0.2" "127.0.0.3" "127.0.0.4" (integer) 1 127.0.0.1:6379> pfcount ips:2018-10-04 (integer) 4
The above is the demonstration. On 2018-10-04, about 4 ips logged into the system! There is a comparison chart of the occupied space with the traditional collection structure on the Internet, post it for everyone to see
Attention, once again, there are errors in using this structure! For example, if you
pfaddenter a million pieces of data,
pfcountthe result may be 999,756!
Geo can be used to store latitude and longitude, calculate the distance between two places, range calculations, etc. The underlying implementation is zset.
There are mainly the following six groups of commands
geoadd: Add the coordinates of a certain geographic location.
geopos: Get the coordinates of a certain geographic location.
geodist: Get the distance between two geographic locations.
georadius: Obtain a set of geographic locations within a specified range according to the given geographic location coordinates.
georadiusbymember: Get a collection of geographic locations within a specified range according to a given geographic location
geohash: Get the geohash value of a certain geographic location.
I ll post an example of the official website document directly here. If you are interested, you can check it yourself. 1. add two coordinates to the key.
redis> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania" (integer) 2
2. calculate an example between two coordinates
redis> GEODIST Sicily Palermo Catania "166274.15156960039"
Finally, calculate the distance from the latitude and longitude
200kmthe coordinates within the range
redis> GEORADIUS Sicily 15 37 100 km 1) "Catania" redis> GEORADIUS Sicily 15 37 200 km 1) "Palermo" 2) "Catania"