Log in

No account? Create an account
Peter Zaitsev: MySQL, Linux, OpenSource and whatever I find worth to write abou
[Most Recent Entries] [Calendar View] [Friends]

Below are the 20 most recent journal entries recorded in peter_zaitsev's LiveJournal:

[ << Previous 20 ]
Friday, August 4th, 2006
5:28 pm
Leaving MySQL, MySQL Consulting
As I already wrote on my new blog I've now left MySQL and will pe starting my own company to provide MySQL Performance Consulting services.

So if you need any help with MySQL or have any friends which need one let me know.

I also will be working on few web projects and opensource projects. I'll keep you posted.
Tuesday, May 30th, 2006
1:37 pm
MySQL Performance blog updates
My blog link at PlanetMySQL.org is still not updated so here is list of posts me and Vadim are written during last couple of weeks.

Group Commit and XA Some more benchmark results for MySQL 5.0 performance regression because of broken group commit.

Speedup your LAMP stack with LigHTTPD The article is about Optimization of Web Server part in LAMP stack in general and how you can use lighttpd to speed things up in particular.

How Web Services world affect LAMP Stack Article discusses some things which become different with Web Services application. Kind of followup to previous post.

MyISAM mmap feature (5.1) Vadim writes about mmap support for MyISAM which he has implemented for MySQL 5.1

Jeremy Cole on MySQL Replication
This article comments on post by Jeremy Cole discussing reliable replication options.

MySQL SpecJAppServer2004 results Announcing MySQL official results in SpecJAppServer benchmark published by Sun with out help.

INSERT ON DUPLICATE KEY UPDATE and summary counters. Article about Using INSERT ON DUPLICATE KEY UPDATE feature to speed up summary tables creation

Join Performance of MyISAM and Innodb This article compares MyISAM and Innodb storage engines in join speed for CPU bound case.

InnoDB memory usage Details about Innodb memory allocation. Something you need to know planing MySQL server memory usage.
Thursday, May 18th, 2006
4:13 pm
MySQL Performance articles update
It looks like my request to update PlanetMySQL.org to use new blog is not yet handled so let me post here links to the fresh articles I've written recently:

MySQL Server Memory Usage This artice speaks about MySQL Server Memory usage, how to estimate it and control it.

To Pack or Not to Pack - MyISAM Key Compression This article speaks about benefits and draw back about MyISAM packed keys.
Saturday, May 13th, 2006
3:42 pm
MySQL stuff moves to new address

I decided to start blogging about MySQL stuff in the different blog http://www.mysqlperformanceblog.com This will give me more control allowing to publish files on the same web site as well as track visitors stuff. I will be still updating this one every so often with my personal records.

For LiveJournal users - I've created syndication account "mysqlperf" which syndicates my new blog, so you can just add it to your friends list if you want to follow new blog.

PlanetMySQL.org is not updated yet to include new blog. I hope this will happen soon.

Lets hope I'll have time to update my new blog more frequently. And your comments surely would be good motivator :)
Tuesday, April 11th, 2006
4:00 pm
MySQL: Using giant number of connections
Every so often I need to reply the question, what is the best way to use 10.000 connections with MySQL (you can select your favorite big number here)
Well. The good answer is your better not.

First let see where this requirement typically comes from ? Most commonly it is something like we have 40 apache boxes each having 200 children with persistent connections which gives us 8000. Well... This brings me to two questions - do you really need persistent connections and do you really need 200 apache children ?

Persistent connections will give you decent performance benefit only in one case - if there are few database queries to create the page and if database load is large response time contributor. In modern rather complex applications it is rarely the case. This is one of examples why persistent connections were disabled in "MySQLi" for PHP. Java applications are better in this sense as they typically use connection pooling rather than persistent connection which needs far less connections and much more efficient.

Now about 200 apache children on the box ? Why so many of them are required. In some cases it is good justified but in most it is simply poor configuration of Web part. 90% of these 200 children will be spoonfeeding slow clients, while keeping their large chunk of memory allocated by mod_php
mod_perl or whatever you use part. One thing you can do to solve this problem is to place lightweight proxy in front which can spoonfeed your clients, do keepalive and even cache if something can be cached. You can use apache module for this purpose while I'd like something more lightweight, ie lighttpd. Important requirement here is not using process/thread per connection and based on events rather than select() or pool() in handling network as these are slow with large number of connections. By using these you can have 10.000 connections or more per box with small number of apache children. The other option is to use lingerd, which I did not try however. Finally the most radical option is to use fast-cgi interface which allows you to use much wider selection of web servers.

So if 200 processes is a lot how many is enough ? Well it depends on the application. In certain cases, for example if application uses Web Services internally or some other remote network calls. In this case you simply need a lot of processes running to get the performance you need. In most common case however if load is CPU bound and does local database calls you can do better with much smaller value.

Each "device" have the optimal number of requests which it can handle at the time. for CPU signle running thread per core is optimal. For hard drives
it may be few requests per drive (to allow scheduling) but if you get too many you would get your sequential scans interrupted by random IO etc.
Also you should count memory and CPU cache - the more concurrent processes are running the smaller portion of CPU cache and OS disk cache each of them get, not to mention they use memory themselves which reduce OS cache and even could cause swapping. The optimal number of proessess depends on the load, if it is CPU bound you would watch number of running threads more, if it is IO bound you would watch number of outstanding requests.
As a quick and dirty I usually use (NumCPUs+NumDisks)*2 value. Depending on the application, OS, hardware and MySQL bugs it may be more or less than that.

The curious thing about this number is - it is kind of universal, even if it is not always 42. If you look at "localy running threads", meaning threads running on local CPUs or being blocked on local resources you can use this value - if it is Web server, application server or database server. If process is waits on network it is not local resource so you likely want to have many of these.

So how many ? I would start guessing from 20-40 is the optimal number of active threads on the Web/Application server. Then it comes to MySQL
I'd see even less than that in "threads_running" Most typically values below 10 are optimal for CPU bound workload and below 30 for IO bound.
It depends on hardware and workload of course.

What are the problems with many connections on MySQL side ?
- Waste of memory. Each connection requires certain resources and might not give away all of them even if it is "sleeping"
- Load On OS - Operation System rarely enjoys managing 1000s of threads TCP/IP connections
- Many Running threads - thrashing of CPU cache and So waste of CPU resources.
- Many Running threads - mutex ping-pong. Especailly bad for Innodb tables
- Small annoyances as hard to deal with "show processlist", broken "SHOW INNODB STATUS"
- Innodb can allocate up to 1024 undo log entries and will fail to handle more active transactions innodb_thread_concurrency however can be used to
fix it


Wanted to add couple of more points to the article.

So what is about people which use 3.000-5.000 connections with MySQL successfully ? Well the point is not it can't work but what it is usually not the most optimal way. If you have some memory and resources to waste. Why not ? In some cases it is actually not that bad - especially if you get just few running threads, but there are much more bumps on this road.

So what if you really need to have a lot of connections and can't do anything about it ? Well then you better to trim down your per thread buffers to ensure MySQL does not run out of memory or start swapping. Finding the optimal values is tricky question by itself which I should write another article about. Also do not forget to use proper innodb_thread_concurrency.

What is about slow network requests during processing which increase number of active processess you need ?
If you want to keep number of connections down you may disconnect and reconnect. Especially do not hold open transaction or locked tables for the
length of operation unless you really know what you're doing. But this is needed in extreme case. Frequently such connection is fine to remain open
as it will be idle.

What one could to to trim down number of persistent connections if application can't be modified ? You can reduce wait_timeout to some short value so MySQL will close idle connection quickly. Watch however not to set it low enough so close connection in the middle of something.

What are dangers of frequent connects ?
It works well, with exception of some extreme cases. If you get hundreds of connects per second from the same box you may get into running out of local port numbers. The way to fix it could be - decrease "/proc/sys/net/ipv4/tcp_fin_timeout" on linux (this breaks TCP/IP standard but you might not care in your local network), increase "/proc/sys/net/ipv4/ip_local_port_range" on the _client_. Other OS have similar settings. You also may use more web boxes or multiple IP for your same database host to work around this problem. I've realy seen this in production.

Some background about this problem: TCP/IP connection is identified by localip:localport remoteip:remote port. We have MySQL IP and Port as well as client IP fixed in this case so we can only vary local port which has finite range. Note even after you close connection TCP/IP stack has to keep the port reserved for some time, this is where tcp_fin_timeout comes from.
Sunday, April 2nd, 2006
5:01 pm
MySQL: Preparing for MySQL Users Conference
MySQL Users Conference 2006 is now in close proximity and I'm working hard preparing my presentations.
I've got 3 ones this year. All should be very interesting one but challenging to prepare.

First one is MySQL Performance Optimization Tutorial, which I'm doing with Tobias. It is kind of sounding similar to what I've been doing last year
but it is a lot different in the content. Frankly speaking we did not recycle any of the old stuff and just wrote everything from scratch. To learn from
previous year mistakes we now have it more condensed so we can cover everything in short 3 hours, as well as keeping it much more practical. Based on My and Tobiases experience of performance optimization of real application we kind of provide how to or insight on how we work, what exactly do we do, why and how we do it. For particular application there are of course much more specific tricks which can be played but at least we're sharing most generic, simple and powerful ones.

Second one is presentation about MySQL 5.0 to 4.1 Performance. This one required running a lot of benchmarks to be able to quantify performance improvements of various MySQL 5.0 optimizations and new features, as well as performance regression in the cases it went slower. Yes we're not going to hide anything - there are many performance improvements and new features, but these had to be paid for by reduced performance in certain other cases.

Third one is "MySQL Performance Landscape" - This one is the most time consuming to prepare. I and members of MySQL High Performance Group run well over the hundred benchmark to investigate MySQL Performance with different workloads on different platforms. How well MySQL scales for Multiple CPUs, Which operation system is better, does Dual Core really rule etc, we'll try to have many of such questions answered.
Friday, December 9th, 2005
5:48 pm
MySQL: Hardware RAID5 vs RAID10 analyses
Recently I had a chance to revisit question of RAID5 vs RAID10 performance for MySQL OLTP workloads (Random IO). The previous set of benchmarks
I've done for MySQL Users Conference almost two years ago were done on RH AS 3 and 8 Drive SATA 7200 Drives connected via 3Ware 8500 or something.
In that configuration RAID5 looked horrible compared to RAID10 showing absolutely terrible performance.

This time I've tested on Dell PowerEdge 2850 with 6 10000RPM SCSI drives connected to embedded RAID (LSI MegaRaid) with 256MB of cache and battery backed up. Operation system was CentOS 4.2 x86_64, ext3 file system.

Read more...Collapse )
Tuesday, November 1st, 2005
2:18 pm
MySQL 5.0 - time to upgrade ?
Now as MySQL 5.0 is finally released as GA, or stable I guess many of your are asking yourself if it is the time to upgrade.
Let me express my opinion on this matter.

Read more...Collapse )
Saturday, September 10th, 2005
4:13 pm
Second time father
Yesterday, 9th of September at 13:16 my second Baby was bord. Girl named Nadya - weight 2900gr.

Today we're all safe back home from hospital next time and baby is settling in.
Labor was rather quick and with little pain (thanks to Epidural) - attending childbirth
classes I was taught a lot about scary things such as cesarian, episotomia, vacum extractors
and such but happily none of this was needed. Actually the "push" phase itself took less than
10 minutes and I guess 4-5 pushes, even our doctor was surprised.

It is very rewarding to attend childbirth, and I guess very helpful for mother as well. My
previous baby was delivered in Russia (6 years ago) - Mother and baby spent 4 days in the hospital
and I even was not allowed to attend them - Kate would show me Vanya out of the window on the 6th
floor or something.
Wednesday, July 27th, 2005
10:09 pm
MySQL: Something to watch with replication
Last week I ran into this situation 3 times, so it must be common and I guess extremely upsetting as there is limited things you can do when it happens,
being informed in advance you however can plan accordantly and avoid the problem. What am I speaking about ? Just MySQL Replication in large databases with heavy writes. What is wrong with it you would ask and would be correct - in most cases Replication is working and performing great but there are some things to watch out.

Read more...Collapse )
Saturday, June 11th, 2005
3:56 pm
MySQL: Innodb bottlenecks
Recently we've discovered couple of Innodb performance issues I was not aware of. Well... I knew these phenomena exists but I did not think
the impact is that dramatic.

1) Innodb auto_increment lock.
We've been seeing a lot serious performance degradations with SysBench tests with Innodb tables with large amount of connections. At 1024 connections
performance could be 10 times lower than with 10 connections. Well, it turns out the lock taken for auto_increment column for insertion is responsible for large portion of it. The sad thing is this lock is taken even if you explicitely specify value you're inserting - as soon as column has auto_increment column. We're still to investigate why regression is so serious but in the meanwhile the advise could be to avoid using auto_increment
if you do not really use it. The good part however is - it is not that typical case. Many connections I'm speaking about is not really number of connections but of concurrently running query - if you have 1000 connections but they are idle most of the time you would not see this effect.

2) The second surprise was "deleted index entries" - the fact is Index in Innodb contains pointers to all rows, even thouse which are deleted,
or which changed column value so they no more match the current key. This is done to allow using indexes for reading old versions, however this means
queries may slow down significantly if there is long transaction running which does not allow purge to happen. Just to clarify - it is _any_ long transaction - even the one which does not touch table in question - Innodb needs to maintain old version for old tables as there is information which table
going transaction will need to access next. So if you have some script which hung for hours with trasaction open you might be in trouble.

So when this hurts a lot ? When there are a lot of rows in the index which are no more where. The typical case when I've seen is for example various queues implementations. For example you have flag "needs_review" in the table which is set to 1 when object needs to be reviewed. You may
have simple query to pull up next one for review - "select * from tbl where needs_review=1 limit 1" - Logically this query will require single row read by index, but in practice it may be pretty long index scan to find first non-deleted row.

This tricky thing versioning adds are often forgotten about.
Friday, April 1st, 2005
9:45 am
Full Text Search with MySQL
During recent months I had some performance engagements, spoke to people and was dealing with support cases, which had one problem in common -
MySQL Full Text search does not perform well on large data sets

Read more...Collapse )
Monday, March 21st, 2005
1:46 pm
Faking the sync
Someone could find it an old and boring story but it is still popping up and people are still loosing data, so it is
worth to go other it once again.

MySQL/Innodb as any other transactional database relies on OS guaranties to provide its own. Transaction will be durable
and database intact on the crash only if database will perform synchronous IO as synchronous - reporting it is done when
data is physically on the disk.

Read more...Collapse )
Wednesday, February 16th, 2005
12:42 pm
LinuxWorld Boston Notes
Yesterday was Opening of LinuxWorld conference in Boston. A lot of new stuff being anounced, even not directly connected to Linux which means growing importance of it.

Read more...Collapse )
Friday, February 11th, 2005
11:10 am
MySQL: Upgrading MySQL 4.0 to 4.1
Upgrading to next MySQL version was usually quite safe. I remember quite painlessly uprading from 3.22 to 3.23, Upgrade 3.23 to 4.0 cause a bit of head ache as behavior of substraction was changed, this however was easily fixed by --sql-mode=NO_UNSIGNED_SUBTRACTION option.

With upgrade of 4.0 to 4.1 there are much more surprises, which I still see users and customer running into. We have extensive list of upgrade notes
but the practice shows not everyone has read it before upgrading: http://dev.mysql.com/doc/mysql/en/upgrading-from-4-0.html

So few highlights, ranging from minor to serve.

- There are various minor but annoying syntax changes. For example you can't use "create table x(i int)" any more as I became function,
you also now need () in your UNION queries when you previously could have skipped them.

- SHOW CREATE TABLE and SHOW TABLE STATUS was changed to use "Engine" instead of "Type" which broke some of my scripts which detect storage engine of the table to decide how to work.

- ENUM values are now case insensitive by default which can give you some surprises,

- TIMESTAMP fomat output has changed significantly so if you parse it in your scripts it is very likely they are broken.

- "Illegal mix of collations" can drive you nuts, for example previously you could compare string to binary string while now you can't

- Sorting order was changed if you have characters with codes less than 32 in the end, for example "\t" and "\n" This can corrupt your tables, and even reported to hang some applications, so it is better to run CHECK TABLE after upgrade or check if you have such data in tables before. There are
few more cases when data dump/reload may be needed for upgrade.

- Authentication changes may break your applications linked to old client library, use --old-passwords option at server or relink

- Client library name (version part) changed so dynamically linked applications may no more work after upgrade. MySQL-Compat RPM package can be used
to cure it. Especially this is frequently reported with PHP

- Limit does not accept -1 as second parameter, use some large number instead (yes find all your queries and fix them)

- I also heard some changes were made in a way meta-data returned (column types etc) - this can affect PHP or JDBC applications getting extensive
use of it.

- Database,Table names and meta data are treated as unicode now, so if you used non-latin1 (ie Russian) table name you better to rename them for
upgrade otherwise you will get garbage where.

I'm sure there are more surprises, these are just I have seem most typically to happen.

Also be careful upgrading/downgrading if you're using UTF8. If you will configure MySQL 4.1 to use utf8 by default and just start it with MySQL 4.0 latin1 tables you're likely to trash your data, as there is no charset information in old tables and MySQL 4.1 will assume the data is in unicode.
So do not change character set straight ahead but run 4.1 with same charset as 4.0 before and use ALTER TABLE to convert tables to 4.1 format - this
has character set information so you should be safe.

And yes... Downgrade from 4.1 to 4.0 is no more safe with this all these changes. Worst of all it is not clearly sure when it is safe and when it is not
(On my dev machine I bounce between 4.1 and 4.0 pretty frequently and it works fine, but I do not do anything advanced wher), so take backup before upgrade and test it well before going to production (which is always good idea anyway).
Monday, January 31st, 2005
9:05 pm
Preparing for Users Conference
Time to work on Presentations for User Conference 2005. I'll have Tutorial and Presentation both on MySQL Performance.
Tutorial will be about general MySQL performance tuning starting from pretty simple and going to rather advanced stuff. Presentation
will be pretty advanced. If someone heard my presentation last year - This time I'm going to go even deeper.

You can find more information about conference at http://www.mysqluc.com
Pre-registration is now open, so if you're planning to attend it is good to register now.
Monday, January 10th, 2005
4:43 pm
MySQL: Loading large tables with Unique Keys
Normally MySQL is rather fast loading data in MyISAM table, but there is exception, which is when it can't rebuild indexes by sort but builds them
row by row instead. It can be happening due to wrong configuration (ie too small myisam_max_sort_file_size or myisam_max_extra_sort_file_size) or
it could be just lack of optimization, if you're having large (does not fit in memory) PRIMARY or UNIQUE indexes.

Below is the internal letter I've sent out on this subject which I guessed would be good to share

Read more...Collapse )
Thursday, December 30th, 2004
9:55 pm
MySQL: Making caching more efficient
In many applications the data is frequently accessed not uniform way. Some rows are accessed a lot while others barely are. So lets say you have 100.000.000 rows in the table while only 1.000.000 of them are regularly accessed. How much memory would you need to make all of your accesses served from memory ? If you would answer enough to hold 1.000.000 rows with indexes and overhead you might be miscounting quite a lot.

Read more...Collapse )
Thursday, December 23rd, 2004
11:00 am
MySQL: Make sure fsync() works
I've seen people many times people loosing latest transaction(s) because of their broken OS fsync() implementation,
Problem probably was not that bad as no database corruption was happening due to the fact the problem normally was flushing
drive cache which seems to only affected sequential writes, so pages going via Innodb doublewrite buffer made it to the disk.

It looks however as it is not always the case. We got customer loosing a lot of data on MacOS X which known to have fake fsync(). I do not know if MacOS X fsync() was broken more than usual or some hardware could loose data for longer time
frame, still recommendation is the same - if you're storing sensitive data in MySQL make sure you platform is as safe as you need.

Poweroff boxes few times during some disk intensive multi use benchmark running and see if you get database corruption -
run CHECK TABLE to see if they are fine. Also try simple sequential insert application to make sure last insert record is
always where after poweroff (if you need such guaranties). If you're using external SCSI you can also try disconnecting
SCSI cables and other torches to emulate hardware failures you might have. Spend the time or you will spend much more time recovering your data if problem happens.

By the way this problem is not specific to MySQL at all, more or less it should affect all on disk databases.
Wednesday, December 22nd, 2004
11:06 pm
MySQL: Innodb Mutexes
Looking how Innodb's own mutexes are implemented, it is not as simple as I previously thought...

Read more...Collapse )
[ << Previous 20 ]
MySQL Performance Blog   About LiveJournal.com