peter_zaitsev (peter_zaitsev) wrote,

MySQL: Making caching more efficient

In many applications the data is frequently accessed not uniform way. Some rows are accessed a lot while others barely are. So lets say you have 100.000.000 rows in the table while only 1.000.000 of them are regularly accessed. How much memory would you need to make all of your accesses served from memory ? If you would answer enough to hold 1.000.000 rows with indexes and overhead you might be miscounting quite a lot.

The trick is data is not typically cached by rows, but by pages. Even for MyISAM which does not have row cache OS cache will be used which typically uses 4K pages. The same applies to the keys which would use 1KB-4KB pages for MyISAM and 16KB pages for Innodb. Assuming you have 100 byte rows and 70% fill factor there will be more than 100 rows per Innodb page, which means your cache might be some 100 times less efficient than you think - instead of having space for 10000 pages which store 1.000.000 rows you can get these 10.000 pages to store only 10.000 rows you need. Similar situation happens to the indexes.

Of course it highly depends on the application - in some cases fit could be almost perfect - For example if you're using logging having only index in time and frequently accessing records for last day only. But you might not be such lucky.

So what one can do to improve efficiency. Lets say we have 100.000.000 users for service while only 1.000.000 of them are really active and their profiles need to be cached.

One thing - you can simply store them in different "hot" table, doing periodic purges for inactive users and copying data to the table on demand.
It is not very convenient but really can speed things a lot. You can use even HEAP table to cache active items if number of them is small enough
(You can use memcache for this purpose as well if you do not need SQL interface)

The similar approach can be found for example in Wikipedia using "cur" and "archive" tables - only current versions are frequently accessed while previous versions are not, so they are stored in different tables instead of for example storing all of them in the same table and having "current" field.

You can also improve caching for Innodb while sticking to the same table by making sure frequently accessed rows are located close to each other.
As Innodb Clusters by primary key you can use "active" prefix as the part of it, setting it for the rows you need. It is expensive however as
during primary key update row has to be physically moved.
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic
  • 1 comment