## introduction

Speaking of `redis`

data structures, you may be familiar with the five basic data types: `String`

, `Hash`

, `List`

, `Set`

, `Sorted Set`

. Then addition, there are three major derived data structure, we usually are little contact, namely: `bitmaps`

, `hyperloglog`

, `geo`

Also, I think, these three data structures, can only be icing on the cake. Really in the project, I really haven't used it. Let's take a look at the **definition** and **use of** these three data structures

## bitmaps

#### definition

Speaking of this `bitmaps`

, it is actually `String`

, but it can operate on `String`

the bits. And then, this bit operations, has its own command, and so the operation `String`

of the `redis`

command not much different! Can be understood like this

Bitmaps is an array in units of bits, each unit of the array can only store 0 and 1

Here is an example, for example, if we want to do a set operation, `key`

for `w`

, `value`

for `h`

, then execute the following command

```
127.0.0.1:6379> set w h
OK
127.0.0.1:6379> get w
"h"
```

Then `h`

the ASCII is `0110 1000`

Next, you can use the bit command `getbit`

command to fetch, fetch the content of each bit.

```
127.0.0.1:6379> getbit w 0 # getbit w 0
(integer) 0
127.0.0.1:6379> getbit w 1 # getbit w 1
(integer) 1
127.0.0.1:6379> getbit w 2 # getbit w 2
(integer) 1
127.0.0.1:6379> getbit w 3 # getbit w 3
(integer) 0
```

#### use

According to online rumors, this structure is used to count the number of active users in a certain period of time, and the structure used `bitmap`

is less `set`

space- saving than the traditional structure. However, this use has great limitations, as I will talk about later. Let me talk about the online statement. Suppose there are 30 users, of which 5 users `2018-10-04`

log in on this day. Suppose the userid of these 5 users is 2, 4, 8, 11, 12. Then, we assume that the key is `users:2018-10-04`

, and its `value`

value is used to record user login information. So in order to record that the above 5 users have logged in, we `value`

set the 2nd, 4th, 8th, 11th, and 12th bits of the value to 1, that is, execute the following command

```
127.0.0.1:6379> setbit users:2018-10-04 2 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 4 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 8 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 11 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 12 1
(integer) 0
```

At this time, for example, if you want to determine the user with userid=11, on 2018-10-04, if you have logged in, execute the following command

```
127.0.0.1:6379> getbit users:2018-10-04 11
(integer) 1
```

The result is 1, which means the user has logged in. If the returned result is 0, it means that the user has never logged in. If you want to count, 2018-10-04, the number of users logged in on this day, you can execute the following command

```
127.0.0.1:6379> bitcount users:2018-10-04
(integer) 5
```

The above command can be counted, 2018-10-04, 5 users logged in on this day. Ok, you won't be able to understand much if you find it here. Let me talk about it first, here `userid=2 4 8 11 12`

, it can be understood as an offset. For example, if the userid in the actual project is 1000002, then the offset is 2. Everyone can be flexible in the project. However, this approach has a limitation. In our actual project, if the userid is generated using uuid, then how do you generate the offset based on these userid? Could it be that you have to find a hash function to generate an offset? Even if the corresponding hash function is found, can you ensure that the hash collision will not occur, resulting in the same offset? So, everyone can understand.

## HyperLogLog

#### definition

`HyperLogLog`

It is not a data structure, but an algorithm that can use a **very small memory space to complete the statistics of independent totals** . In fact, everyone may be relatively unfamiliar with this algorithm. We `java`

have a library called `stream-lib`

, which also implements `HyperLogLog`

algorithms. I probably talk about the **principle of** the algorithm . I don't want to go to a long-formed math paper. Everyone looks boring. This `Hyper`

refers to the **super** meaning. Its previous life is the **LogLog** algorithm. Here, I m just a little bit stunned, so that everyone can understand the essence. Suppose there is the following dialogue

Me: "Small song, suppose, I tossed 5 coins in a round, and after many rounds, I found that in these rounds, there were at most 2 consecutive negatives and 1 positive. Can you guess how many rounds I lost? "Small tune: "It shouldn't be a few rounds, at most seven or eight rounds." Me: "Fuck, so witty, how do you count?" Small tune: "It's very simple, the pros and cons are both 1/2, even Focusing on the second negative side and the first positive side. Isn t it 1/2

1/21/2!" I: "What if there are 4 negative sides and 1 positive side at most?" Xiao tune: "It should be many rounds, right? !" Me: "It's really witty!"

The above chat is between myself and my colleague, and they play each other every day! Any similarity is purely coincidental!

Well, the principle is over! It's just that his estimation algorithm is more complicated! It's not that simple! And with this estimate, the error is still relatively large! The pseudo code of the algorithm is given below.

```
max = 0
hashCode = hash( )
num = hashCode 0
if num > max:
max = num
2 (max + 1)
```

It should be noted

```
hashCode = hash( )
```

It is to map your input element into a long binary string. After mapping it into a long binary string, it can be compared to the result of the coin toss I mentioned first. As for why the final result is used `(max+1)`

, you can check the literature. After all, this article is talking about `redis`

, not about this algorithm. Moreover, this algorithm has undergone a series of evolutions. For example, the input parameter set is divided into m parts, and then `m`

the results of each part are averaged `(avg)`

, and finally the `(avg + 1)`

independent total is estimated by the power of 2 ! Those readers who are interested can make inquiries by themselves!

#### use

This structure can save memory to count various counts, such as the `IP`

number of registrations and the number of daily visits `IP`

. Of course, there are errors! The official number given by Redis is 0.81% error rate. The usage is also very simple as shown below

```
127.0.0.1:6379> pfadd ips:2018-10-04 "127.0.0.1" "127.0.0.2" "127.0.0.3" "127.0.0.4"
(integer) 1
127.0.0.1:6379> pfcount ips:2018-10-04
(integer) 4
```

The above is the demonstration. On 2018-10-04, about 4 ips logged into the system! There is a comparison chart of the occupied space with the traditional collection structure on the Internet, post it for everyone to see

Attention, once again, there are errors in using this structure! For example, if you `pfadd`

enter a million pieces of data, `pfcount`

the result may be 999,756!

## Geo

### definition

Geo can be used to store latitude and longitude, calculate the distance between two places, range calculations, etc. The underlying implementation is zset.

### use

There are mainly the following six groups of commands

`geoadd`

: Add the coordinates of a certain geographic location.`geopos`

: Get the coordinates of a certain geographic location.`geodist`

: Get the distance between two geographic locations.`georadius`

: Obtain a set of geographic locations within a specified range according to the given geographic location coordinates.`georadiusbymember`

: Get a collection of geographic locations within a specified range according to a given geographic location`geohash`

: Get the geohash value of a certain geographic location.

I ll post an example of the official website document directly here. If you are interested, you can check it yourself. 1. add two coordinates to the key.

```
redis> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania"
(integer) 2
```

2. calculate an example between two coordinates

```
redis> GEODIST Sicily Palermo Catania
"166274.15156960039"
```

Finally, calculate the distance from the latitude and longitude `(15,37)`

distance `100km`

and `200km`

the coordinates within the range

```
redis> GEORADIUS Sicily 15 37 100 km
1) "Catania"
redis> GEORADIUS Sicily 15 37 200 km
1) "Palermo"
2) "Catania"
```