Abdel-Mawla Gharieb, Galera Cluster, information_schema, MySQL, MySQL Replication, pt-online-schema-change
MySQL information_schema comes with useful information about the database instance, status, … etc. which is needed for daily DBA work.
There are some simple queries on the information_schema that I use on my daily basis in which I’m writing this post for my reference and maybe a good reference for someone else too …
Finding tables without Primary or Unique Keys:
PKs are so important, especially, for InnoDB tables as MySQL uses PKs as a clustered index and having no PKs might lead to severe performance problems.
Also having no PKs is one of the main causes of slave lagging problems mainly when using RBR (Row-Based Replication), e.g. if a delete statement on the master will delete 1 million rows on a table without PK, a full table scan will take place. This “might” not be a problem on the master but on the slave 1 million full table scan will take place – because changes to the individual rows are being written to the binary logs in ROW format not the actual statement itself – which of course will cause the slave to be lagged. For more information about the Replication formats, check the manual documentation.
In Galera Cluster setups, tables without PKs might cause replication lag on the slave nodes and some other troubles as well:
“When tables lack a primary key, rows can appear in different order on different nodes in your cluster. As such, queries like SELECT…LIMIT… can return different results. Additionally, on such tables the DELETE statement is unsupported.“.
More information about Galera Cluster limitations, can be checked out here.
So, it is important to find out if there are any tables without PKs or not to fix the problem ASAP:
FROM information_schema.TABLES t
INNER JOIN information_schema.COLUMNS c
AND t.TABLE_SCHEMA NOT IN ('performance_schema','information_schema','mysql')
GROUP BY t.TABLE_SCHEMA,t.TABLE_NAME
HAVING sum(if(column_key in ('PRI','UNI'), 1,0))=0;
Finding Foreign key constraints:
If you are using the Percona tool pt-online-schema-change to apply schema changes, having FKs in your tables will make the tool’s operation more complicate and additional options should be used:
“Foreign keys complicate the tool’s operation and introduce additional risk. The technique of atomically renaming the original and new tables does not work when foreign keys refer to the table. The tool must update foreign keys to refer to the new table after the schema change is complete. The tool supports two methods for accomplishing this. You can read more about this in the documentation for –alter-foreign-keys-method.
Foreign keys also cause some side effects. The final table will have the same foreign keys and indexes as the original table (unless you specify differently in your ALTER statement), but the names of the objects may be changed slightly to avoid object name collisions in MySQL and InnoDB.”.
Check out my blog for more information on how to use pt-online-schema-change!
Also, foreign keys are supported only in InnoDB so if it is required to convert your tables to MyISAM, you should check first those constraints and remove them before converting to MyISAM, otherwise, the ALTER statement will fail:
SELECT referenced_table_name parent, table_name child, constraint_name
WHERE referenced_table_name IS NOT NULL
ORDER BY referenced_table_name;
By the time, tables become fragmented due to many writes (inserts, updates and deletes) so, reorganizing the table and the index will improve the performance and also reclaiming the disk space for use by the operating system might be required (Assuming that innodb_file_per_table option was enabled before creating the InnoDB tables). This can be achieved by executing “OPTIMIZE TABLE” statement but it is expensive. However, we can check the tables’ fragmentation first and then execute “OPTIMIZE TABLE” only on those tables having high fragmentation (the following query will return only the tables in db_name schema having data free more than 100MB):
SELECT TABLE_NAME, (DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024 AS sizeMb,DATA_FREE / 1024 / 1024 AS data_free_MB
WHERE engine LIKE 'InnoDB'
and TABLE_SCHEMA = 'db_name'
AND DATA_FREE > 100 * 1024 * 1024;
Checking if there are any MyISAM tables or not:
MyISAM is a non transactional SE and having a consistent backup where there are MyISAM tables requires locking all tables. So, before considering the backup plan for a system, it is recommended to know if there are any MyISAM tables or not:
SELECT TABLE_SCHEMA, TABLE_NAME
WHERE TABLE_SCHEMA NOT IN ('information_schema','performance_schema','mysql')
Do you have your own useful queries on the information_schema? Feel free to write them down in a comment 🙂
Gerardo Narvaja wrote:
When you query the information_schema keep in mind that it’s a good idea to turn off “innodb_stats_auto_recalc” to avoid recalculating InnoDB statistics every time you run a query related to it.
Here are some of my queries:
Monitor buffer pool variables:
SELECT variable_name AS c_key, variable_value AS c_value FROM global_status WHERE variable_name LIKE ‘InnoDB_buffer_pool%’ AND variable_name NOT LIKE ‘InnoDB%status’;
Monitor command counters:
SELECT variable_name AS c_key, variable_value AS c_value FROM global_status WHERE variable_name IN (‘com_insert’, ‘com_insert_select’, ‘com_select’, ‘com_update’, ‘queries’, ‘questions’);
Monitor handler counters (lower level DB activity):
SELECT variable_name AS c_key, variable_value AS c_value FROM global_status WHERE variable_name LIKE ‘handler%’;
SELECT variable_name AS c_key, variable_value AS c_value FROM global_status WHERE variable_name in (‘aborted_connects’, ‘max_used_connections’, ‘threads_connected’);
Jozef Cuj wrote:
maybe basic but useful for me
* display number of connections for each user
SELECT USER, COUNT(*) FROM information_schema.processlist GROUP BY USER;
* display number of connections for each host
SELECT HOST, COUNT(*) FROM information_schema.processlist GROUP BY HOST;
* display root user activity
SELECT * FROM information_schema.processlist WHERE USER = ‘root’;
* display processes associated with SELECT queries
SELECT * FROM information_schema.processlist WHERE INFO LIKE ‘SELECT%’;
SELECT * FROM information_schema.processlist WHERE INFO LIKE ‘UPDATE%’;
* display average query time for each database
SELECT DB, AVG(TIME) FROM information_schema.processlist GROUP BY DB;
Mohammed Moosa wrote:
To check the Number of connection made to each database/schema.
select concat(rpad(HOST,20,’ ‘), rpad(DATABASE_NAME,30,’ ‘) , lpad(NUM_OF_CONNECTION,10,’ ‘) ) as ‘HOST DB COUNT’ FROM (select substr(host,1,instr(host,’:’)-1) as HOST, db as DATABASE_NAME, count(1) as NUM_OF_CONNECTION from information_schema.processlist group by 1,2 order by 1,3 ) A
Nice queries also posted by Shlomi Noach:
Get the total number of connections per user per host:
SELECT user, SUBSTRING_INDEX(host,’:’,1) hostname, COUNT(*) nConns
GROUP BY user, hostname
ORDER BY 3;
Calculate the size of your dataset:
sys.format_bytes(sum(data_length + index_length)) ‘TOTAL SIZE’
FROM information_schema.TABLES ORDER BY data_length + index_length;
Thank you for nice information.
I am observing High Memory Usage every time I query below command from MySQL 5.7.22 version. I am not sure if hitting with any bug. Any Idea or hint please?
SELECT table_name, avg_row_length, data_length, max_data_length, index_length, data_free, table_schema FROM information_schema.tables WHERE TABLE_TYPE = ‘BASE TABLE’ AND max_data_length > 0 ORDER BY table_schema ASC;
I can actually run the query on the same MySQL version with no issues and it takes less than 1 sec to return the data. Have you checked if some long-running queries were being executed at the times you ran the query? Or maybe the server was at a high load?