peter_zaitsev ([info]peter_zaitsev) wrote,

MySQL: Using giant number of connections

Every so often I need to reply the question, what is the best way to use 10.000 connections with MySQL (you can select your favorite big number here)
Well. The good answer is your better not.

First let see where this requirement typically comes from ? Most commonly it is something like we have 40 apache boxes each having 200 children with persistent connections which gives us 8000. Well... This brings me to two questions - do you really need persistent connections and do you really need 200 apache children ?

Persistent connections will give you decent performance benefit only in one case - if there are few database queries to create the page and if database load is large response time contributor. In modern rather complex applications it is rarely the case. This is one of examples why persistent connections were disabled in "MySQLi" for PHP. Java applications are better in this sense as they typically use connection pooling rather than persistent connection which needs far less connections and much more efficient.

Now about 200 apache children on the box ? Why so many of them are required. In some cases it is good justified but in most it is simply poor configuration of Web part. 90% of these 200 children will be spoonfeeding slow clients, while keeping their large chunk of memory allocated by mod_php
mod_perl or whatever you use part. One thing you can do to solve this problem is to place lightweight proxy in front which can spoonfeed your clients, do keepalive and even cache if something can be cached. You can use apache module for this purpose while I'd like something more lightweight, ie lighttpd. Important requirement here is not using process/thread per connection and based on events rather than select() or pool() in handling network as these are slow with large number of connections. By using these you can have 10.000 connections or more per box with small number of apache children. The other option is to use lingerd, which I did not try however. Finally the most radical option is to use fast-cgi interface which allows you to use much wider selection of web servers.

So if 200 processes is a lot how many is enough ? Well it depends on the application. In certain cases, for example if application uses Web Services internally or some other remote network calls. In this case you simply need a lot of processes running to get the performance you need. In most common case however if load is CPU bound and does local database calls you can do better with much smaller value.

Each "device" have the optimal number of requests which it can handle at the time. for CPU signle running thread per core is optimal. For hard drives
it may be few requests per drive (to allow scheduling) but if you get too many you would get your sequential scans interrupted by random IO etc.
Also you should count memory and CPU cache - the more concurrent processes are running the smaller portion of CPU cache and OS disk cache each of them get, not to mention they use memory themselves which reduce OS cache and even could cause swapping. The optimal number of proessess depends on the load, if it is CPU bound you would watch number of running threads more, if it is IO bound you would watch number of outstanding requests.
As a quick and dirty I usually use (NumCPUs+NumDisks)*2 value. Depending on the application, OS, hardware and MySQL bugs it may be more or less than that.

The curious thing about this number is - it is kind of universal, even if it is not always 42. If you look at "localy running threads", meaning threads running on local CPUs or being blocked on local resources you can use this value - if it is Web server, application server or database server. If process is waits on network it is not local resource so you likely want to have many of these.

So how many ? I would start guessing from 20-40 is the optimal number of active threads on the Web/Application server. Then it comes to MySQL
I'd see even less than that in "threads_running" Most typically values below 10 are optimal for CPU bound workload and below 30 for IO bound.
It depends on hardware and workload of course.

What are the problems with many connections on MySQL side ?
- Waste of memory. Each connection requires certain resources and might not give away all of them even if it is "sleeping"
- Load On OS - Operation System rarely enjoys managing 1000s of threads TCP/IP connections
- Many Running threads - thrashing of CPU cache and So waste of CPU resources.
- Many Running threads - mutex ping-pong. Especailly bad for Innodb tables
- Small annoyances as hard to deal with "show processlist", broken "SHOW INNODB STATUS"
- Innodb can allocate up to 1024 undo log entries and will fail to handle more active transactions innodb_thread_concurrency however can be used to
fix it


UPDATE:

Wanted to add couple of more points to the article.

So what is about people which use 3.000-5.000 connections with MySQL successfully ? Well the point is not it can't work but what it is usually not the most optimal way. If you have some memory and resources to waste. Why not ? In some cases it is actually not that bad - especially if you get just few running threads, but there are much more bumps on this road.


So what if you really need to have a lot of connections and can't do anything about it ? Well then you better to trim down your per thread buffers to ensure MySQL does not run out of memory or start swapping. Finding the optimal values is tricky question by itself which I should write another article about. Also do not forget to use proper innodb_thread_concurrency.


What is about slow network requests during processing which increase number of active processess you need ?
If you want to keep number of connections down you may disconnect and reconnect. Especially do not hold open transaction or locked tables for the
length of operation unless you really know what you're doing. But this is needed in extreme case. Frequently such connection is fine to remain open
as it will be idle.

What one could to to trim down number of persistent connections if application can't be modified ? You can reduce wait_timeout to some short value so MySQL will close idle connection quickly. Watch however not to set it low enough so close connection in the middle of something.

What are dangers of frequent connects ?
It works well, with exception of some extreme cases. If you get hundreds of connects per second from the same box you may get into running out of local port numbers. The way to fix it could be - decrease "/proc/sys/net/ipv4/tcp_fin_timeout" on linux (this breaks TCP/IP standard but you might not care in your local network), increase "/proc/sys/net/ipv4/ip_local_port_range" on the _client_. Other OS have similar settings. You also may use more web boxes or multiple IP for your same database host to work around this problem. I've realy seen this in production.

Some background about this problem: TCP/IP connection is identified by localip:localport remoteip:remote port. We have MySQL IP and Port as well as client IP fixed in this case so we can only vary local port which has finite range. Note even after you close connection TCP/IP stack has to keep the port reserved for some time, this is where tcp_fin_timeout comes from.

  • Post a new comment

    Error

    Comments allowed for friends only

    Anonymous comments are disabled in this journal

  • 15 comments

[info]egorfine

April 12 2006, 07:53:34 UTC 6 years ago

Java applications are better in this sense as they typically use connection pooling rather than persistent connection which needs far less connections and much less efficient.

ehh... Connection pooling is much less efficient or persistent connections are much less efficient? :)

[info]peter_zaitsev

April 12 2006, 17:16:58 UTC 6 years ago

Oh Sorry. This should be More efficient of course. So Connection pool is more efficient. The extra reason for this is JDBC needs to do more things on connection creation to adapt itself to MySQL server version and configuration as well as set few things.

[info]egorfine

April 12 2006, 17:26:47 UTC 6 years ago

Generally, connection pooling is always a better idea. I wish I could have it in my environment in perl :)

Статья отличная, спасибо! Тем не менее, there is no straight answer for the straight "what is the best way to use 10.000 connections with MySQL" question :)

So, what is it, the best way? :)

[info]peter_zaitsev

April 12 2006, 17:42:18 UTC 6 years ago

The straight answer is - stop shooting yourself into the foot or stop complaining in hurts :)

The point of this article is you should not. Even in the case there is network is involved so there are many requests being processed at the same time it may be good to reconnect after slow operation is completed.

Some people would argue with me they have good success working with 2.000 or 5.000 connections. The reasons typically would be:

- They have enough memory and resources to waste
- The number of Active threads at MySQL side is usually low

[info]tristadireg

July 16 2008, 16:31:10 UTC 3 years ago

Well first lets look at what GMail can do. . . . The above image of GMail's toolbar covers it pretty well.

[info]andrew_www

April 12 2006, 08:05:44 UTC 6 years ago

а ngnix не пробовал?

[info]madm1ke

April 12 2006, 17:16:22 UTC 6 years ago

он же написал "Finally the most radical option is to use fast-cgi interface". А фронт-енд - по вкусу :)
Мы таки радикалы :)

[info]peter_zaitsev

April 12 2006, 17:20:32 UTC 6 years ago

Да на самом деле для раздачи динамики обычно не критично какой из легких серверов там будет.

Nginx знаю люди используют успешно... в россии. Последний раз я смотрел с английской докой было туговато.

[info]madm1ke

April 12 2006, 17:23:45 UTC 6 years ago

ну, по nginx-то док вообще почти нет... А аннглийская дока - вообще некритично ;)

[info]jeremycole

April 12 2006, 15:40:38 UTC 6 years ago

Hear, hear!

You are absolutely right, yet for some reason people continue to think that it's "easier" for MySQL to manage 10,000 connections than to connect/disconnect often. Ugh.

[info]peter_zaitsev

April 12 2006, 17:25:07 UTC 6 years ago

Well.. This is guess comes from experience with other databases which have more expensive connect, some old recommendations (You could find mysql_pconnect recommendation in almost any PHP book) as well as some gut feeling and trying to avoid active overhead "connecting/disconnecting" leaving out passive one.

[info]egorfine

April 12 2006, 17:27:39 UTC 6 years ago

afair, mysql connection takes less than a millisecond on a recent box, yeah?

[info]peter_zaitsev

April 12 2006, 17:43:23 UTC 6 years ago

Yes. Even over 1gbit network it is less than 1ms in most cases.

[info]charlenemaduf

July 16 2008, 01:03:04 UTC 3 years ago

Compiling C with the Clarion IDE, Part 1: It's Easier Than You Think (Clarion Magazine) 17 MAR 05 Ever find some C source on the Internet that does exactly what you need.

Anonymous

December 14 2006, 11:31:53 UTC 5 years ago

Keep up this great resource.

I have enjoyed visiting this site many times over. So many wonderful works of art to see !
Thanks))))

look.... http://lovmusic.theblogaddict.com
Create an Account
Forgot your login or password?
Facebook Twitter More login options
English • Español • Deutsch • Русский…