Today, the operation of the database has increasingly become the performance bottleneck of the entire application, which is especially obvious for Web applications. Regarding database performance, this is not only something that DBAs need to worry about, but it is something we programmers need to pay attention to. When we go to design the database table structure, when operating the database (especially the SQL statement when looking up the table), we all need to pay attention to the performance of the data operation. Here, we will not talk about the optimization of too many SQL statements, but only for MySQL, the database with the most web applications. Hope the following optimization techniques are useful to you.
1. Optimize your queries for query caching
Most MySQL servers have query caching enabled. This is one of the most effective ways to improve performance, and it is handled by MySQL's database engine. When many of the same queries are executed multiple times, these query results will be placed in a cache, so that subsequent identical queries do not need to manipulate the table and directly access the cached results.
The main problem here is that for programmers, this matter is easily overlooked. Because some of our query statements will let MySQL not use the cache. Consider the following example:
The difference between the above two SQL statements is CURDATE(), MySQL's query cache does not work for this function. Therefore, SQL functions like NOW() and RAND() or other such SQL functions will not turn on the query cache, because the return of these functions will be volatile. So, all you need is to replace the MySQL function with a variable to enable caching.
2. EXPLAIN your SELECT query
Use the EXPLAIN keyword to let you know how MySQL handles your SQL statement. This can help you analyze the performance bottleneck of your query or table structure.
EXPLAIN query results will also tell you how your index primary key is used, how your data table is searched and sorted... etc. etc.
Pick a SELECT statement of yours (it is recommended to pick the most complex one with multiple table joins), and add the keyword EXPLAIN to the front. You can use phpmyadmin to do this. Then, you will see a form. In the following example, we forgot to add the group_id index, and there is a table join:
When we add an index to the group_id field:
We can see that the first result shows that 7883 rows were searched, while the latter only searched rows 9 and 16 of the two tables. Looking at the rows column allows us to find potential performance issues.
3. Use LIMIT 1 when there is only one row of data
Sometimes when you query the table, you already know that there will only be one result, but because you may need to fetch the cursor, or you may check the number of records returned.
In this case, adding LIMIT 1 can increase performance. In the same way, the MySQL database engine will stop searching after finding a piece of data, instead of continuing to find the next piece of data that matches the record.
The following example is just to find out whether there are "Chinese" users. Obviously, the latter will be more efficient than the former. (Please note that the first item is Select *, and the second item is Select 1)
4. Build an index for the search field
Indexes are not necessarily for primary keys or unique fields. If there is a field in your table that you will often use for searching, then please create an index for it.
From the picture above, you can see that the search string "last_name LIKE'a%'", one is indexed, the other is not indexed, the performance is about 4 times worse.
In addition, you should also need to know what kind of search cannot use a normal index. For example, when you need to search for a word in a large article, such as: "WHERE post_content LIKE'%apple%'", the index may be meaningless. You may need to use MySQL full-text index or make an index yourself (for example: search keywords or tags)
5. Use a similar type of example in the Join table, and index it
If your application has many JOIN queries, you should make sure that the Join fields in the two tables are indexed. In this way, MySQL will activate the mechanism to optimize the SQL statement of Join for you.
Moreover, these fields used for Join should be of the same type. For example: if you want to join the DECIMAL field and an INT field, MySQL cannot use their indexes. For those STRING types, you also need to have the same character set. (The character sets of the two tables may be different)
6. Never ORDER BY RAND()
Want to scramble the returned data rows? Pick a random data? I really don't know who invented this usage, but many novices like it very much. But you really don't understand the terrible performance problem of doing so.
If you really want to disrupt the returned data rows, you have N ways to achieve this goal. Such use only makes your database's performance drop exponentially. The problem here is: MySQL will have to execute the RAND() function (which consumes CPU time), and this is to record each row of records, and then sort them. Even if you use Limit 1 it will not help (because of sorting)
The following example is to pick a record randomly
7. Avoid SELECT *
The more data read from the database, the slower the query becomes. And, if your database server and WEB server are two independent servers, this will also increase the load of network transmission.
Therefore, you should develop a good habit of taking whatever you need.
8. Always set an ID for each table
We should set an ID for each table in the database as its primary key, and the best is an INT type (UNSIGNED is recommended), and set the AUTO_INCREMENT flag that is automatically added.
Even if your users table has a field with a primary key called "email", don't let it be the primary key. Using the VARCHAR type as the primary key will degrade performance. In addition, in your program, you should use the ID of the table to construct your data structure.
Moreover, under the MySQL data engine, there are still some operations that require the use of the primary key. In these cases, the performance and settings of the primary key become very important, such as clusters, partitions...
Here, there is only one exception, and that is the "foreign key" of the "associated table", that is, the primary key of this table is formed by the primary keys of several individual tables. We call this situation a "foreign key". For example, if there is a "student table" with student ID, and a "course table" with course ID, then the "result table" is the "associated table", which associates the student table with the course table. In the result table, The student ID and course ID are called "foreign keys", which together form the primary key.
9. Use ENUM instead of VARCHAR
The ENUM type is very fast and compact. In fact, it saves TINYINT, but it appears as a string on the outside. In this way, using this field to make some option lists becomes quite perfect.
If you have a field, such as "gender", "country", "ethnic", "status" or "department", and you know that the values of these fields are limited and fixed, then you should use ENUM instead of VARCHAR.
MySQL also has a "recommendation" (see Article 10) to tell you how to reorganize your table structure. When you have a VARCHAR field, this suggestion will tell you to change it to an ENUM type. Use PROCEDURE ANALYSE() you can get relevant suggestions.
10. Get suggestions from PROCEDURE ANALYSE()
PROCEDURE ANALYSE() will let MySQL help you analyze your fields and their actual data, and will give you some useful suggestions. Only if there are actual data in the table, these suggestions will become useful, because some big decisions need to be based on data.
For example, if you create an INT field as your primary key, but there is not much data, then PROCEDURE ANALYSE() will suggest that you change the type of this field to MEDIUMINT. Or you use a VARCHAR field, because there is not much data, you may get a suggestion to change it to ENUM. These suggestions are probably because there is not enough data, so the decision-making is not accurate enough.
In phpmyadmin, you can click "Propose table structure" to view these suggestions when viewing the table
It must be noted that these are only suggestions. These suggestions will only become accurate when there are more and more data in your table. Remember, you are the one who makes the final decision.
11. Use NOT NULL as much as possible
Unless you have a very special reason to use NULL values, you should always keep your fields NOT NULL. This seems a bit controversial, please read on.
1. ask yourself how big the difference is between "Empty" and "NULL" (if it is INT, then 0 and NULL)? If you think there is no difference between them, then you should not use NULL. (Do you know? In Oracle, NULL and Empty strings are the same!)
Don't think that NULL doesn't need space, it needs extra space, and your program will be more complicated when you compare. Of course, this is not to say that you can't use NULL. The reality is very complicated. There are still some cases where you need to use NULL.
12. Prepared Statements
Prepared Statements are very similar to stored procedures. They are a collection of SQL statements that run in the background. We can get many benefits from using prepared statements, whether it is a performance issue or a security issue.
Prepared Statements can check some of your bound variables, which can protect your program from "SQL injection" attacks. Of course, you can also check your variables manually. However, manual checks are prone to problems and are often forgotten by programmers. When we use some framework or ORM, this problem will be better.
In terms of performance, when the same query is used multiple times, this will bring you considerable performance advantages. You can define some parameters for these Prepared Statements, and MySQL will only parse it once.
Although the latest version of MySQL uses the binary format for the transmission of Prepared Statements, this will make the network transmission very efficient.
Of course, there are some situations where we need to avoid using Prepared Statements because they do not support query caching. But it is said that it is supported after version 5.1.
To use prepared statements in PHP, you can check the manual: mysqli extension or use the database abstraction layer, such as: PDO.
13. Unbuffered query
Under normal circumstances, when you execute a SQL statement in your script, your program will stop there until no such SQL statement returns, and then your program will continue to execute. You can use unbuffered queries to change this behavior.
mysql_unbuffered_query() sends a SQL statement to MySQL instead of automatically fethch and caching the results like mysql_query(). This will save a lot of considerable memory, especially those query statements that will produce a large number of results, and you don t need to wait until all the results are returned. You only need the first row of data to be returned, and you can start working immediately. The result of the query.
However, there are some limitations. Because you either read all the rows, or you have to call mysql_free_result() to clear the results before the next query. Also, mysql_num_rows() or mysql_data_seek() will not be available. Therefore, you need to carefully consider whether to use unbuffered queries.
14. Save the IP address as UNSIGNED INT
Many programmers will create a VARCHAR(15) field to store the IP in the form of a string instead of the integer IP. If you use plastic to store, only 4 bytes are needed, and you can have fixed-length fields. Moreover, this will bring you query advantages, especially when you need to use such WHERE conditions: IP between ip1 and ip2.
We must use UNSIGNED INT, because the IP address will use the entire 32-bit unsigned integer.
For your query, you can use INET_ATON() to convert a string IP into an integer, and use INET_NTOA() to convert an integer into a string IP. In PHP, there are also such functions ip2long() and long2ip().
15. Fixed-length tables will be faster
If all fields in the table are "fixed-length", the entire table will be considered "static" or "fixed-length". For example, there are no fields of the following types in the table: VARCHAR, TEXT, BLOB. As long as you include one of these fields, the table is not a "fixed-length static table". In this way, the MySQL engine will use another method to process it.
Fixed length tables will improve performance, because MySQL searches faster, because these fixed lengths are easy to calculate the offset of the next data, so the reading will naturally be fast. And if the field is not fixed-length, then every time to find the next one, the program needs to find the primary key.
Also, fixed-length tables are easier to cache and rebuild. However, the only side effect is that fixed-length fields will waste some space, because fixed-length fields have to allocate so much space whether you use them or not.
Using the "vertical split" technique (see next item), you can split your table into two, one with fixed length and one with variable length.
16. Vertical split
"Vertical split" is a method of turning a table in a database into several tables by column, which can reduce the complexity of the table and the number of fields, so as to achieve the purpose of optimization. (Before, I did a project in a bank, and I saw a table with more than 100 fields, which was very scary)
Example 1: There is a field in the Users table that is the home address. This field is an optional field. In contrast, you do not need to read or rewrite this field frequently except for personal information when you operate the database. So, why not put him in another table? This will make your table have better performance, think about it, if there are a lot of times, I only have user ID, user name, and password for the user table. , User roles, etc. will be used frequently. A smaller watch will always have good performance.
Example 2: You have a field called "last_login", which will be updated every time a user logs in. However, each update will cause the query cache of the table to be emptied. Therefore, you can put this field in another table, so that it will not affect your continuous reading of user ID, user name, user role, because query caching will help you increase a lot of performance.
In addition, you need to pay attention to the table formed by these divided fields, you will not join them frequently, otherwise, the performance will be worse than when it is not divided, and it will be a very large number. The level of decline.
17. Split large DELETE or INSERT statements
If you need to perform a large DELETE or INSERT query on an online website, you need to be very careful to avoid your actions that stop your entire website from responding. Because these two operations will lock the table, once the table is locked, no other operations can enter.
Apache will have many child processes or threads. Therefore, it works very efficiently, and our server does not want to have too many child processes, threads and database links. This is a huge server resource, especially memory.
If you lock your table for a period of time, such as 30 seconds, then for a site with a high traffic volume, the accumulated access processes/threads, database links, and the number of open files in these 30 seconds may not only It will only allow you to park the WEB service Crash, and it may also cause your entire server to go down immediately.
So, if you have a big deal, you must split it up. Using the LIMIT condition is a good way. Here is an example:
18. The smaller the column, the faster
For most database engines, hard disk operations may be the most significant bottleneck. So, making your data compact will be very helpful in this situation, because it reduces access to the hard drive.
See the MySQL document Storage Requirements for all data types.
If a table only has a few columns (such as dictionary tables, configuration tables), then we have no reason to use INT as the primary key. It is more economical to use MEDIUMINT, SMALLINT or a smaller TINYINT. If you don't need to record the time, using DATE is much better than DATETIME.
Of course, you also need to leave enough room for expansion, otherwise, if you do this later, you will die and be ugly. See Slashdot's example (November 06, 2009). A simple ALTER TABLE statement cost 3 More than an hour, because there are 16 million pieces of data in it.
19. Choose the right storage engine
There are two storage engines MyISAM and InnoDB in MySQL, and each engine has advantages and disadvantages. Kushell s previous article "MySQL: InnoDB or MyISAM?" discussed this matter.
MyISAM is suitable for some applications that require a large number of queries, but it is not very good for a large number of write operations. Even if you just need to update a field, the entire table will be locked, and other processes, even the read process, cannot operate until the read operation is completed. In addition, MyISAM is extremely fast for calculations such as SELECT COUNT(*).
The trend of InnoDB will be a very complex storage engine, for some small applications, it will be slower than MyISAM. He is that it supports "row lock", so it will be better when there are more write operations. In addition, he also supports more advanced applications, such as transactions.
Below is the manual for MySQL
target= _blank MyISAM Storage Engine
InnoDB Storage Engine
20. Use an Object Relational Mapper
Using ORM (Object Relational Mapper), you can get reliable performance increases. All the things an ORM can do can also be written manually. However, this requires a senior expert.
The most important thing of ORM is "Lazy Loading", which means that it will only do it when it needs to get a value. But you also need to be careful about the side effects of this mechanism, because it is very likely to create a lot of small queries and reduce performance.
ORM can also package your SQL statements into a transaction, which will be much faster than executing them individually.
Currently, my favorite PHP ORM is: Doctrine.
21. Beware of "permanent links"
The purpose of the "permanent link" is to reduce the number of times to re-create the MySQL link. When a link is created, it will always be in a connected state, even if the database operation has ended. Moreover, since our Apache started to reuse its child process-that is, the next HTTP request will reuse the Apache child process and reuse the same MySQL link.
PHP manual: mysql_pconnect()
In theory, this sounds very good. But from personal experience (and most people's), this feature creates more troubles. Because, you only have a limited number of links, memory issues, file handles, and so on.
Moreover, Apache runs in an extremely parallel environment, which creates many, many processes. This is why this "permanent link" mechanism does not work well. Before you decide to use "permalink", you need to think about the architecture of your entire system.
Supplement: mysql mandatory index and prohibit an index
1. MySQL forces the use of indexes: force index (index name or primary key PRI)
select * from table force index(PRI) limit 2; (mandatory use of primary key)
select * from table force index(ziduan1_index) limit 2; (mandatory use of index "ziduan1_index")
select * from table force index(PRI,ziduan1_index) limit 2; (mandatory use of index "PRI and ziduan1_index")
2. MySQL prohibits an index: ignore index (index name or primary key PRI)
select * from table ignore index(PRI) limit 2; (primary key is prohibited)
select * from table ignore index(ziduan1_index) limit 2;(It is forbidden to use index "ziduan1_index")
select * from table ignore index(PRI,ziduan1_index) limit 2;(It is forbidden to use index "PRI,ziduan1_index")