Skip to main content

Linux performance basics

I want to write about Cassandra performance tuning, but first I need to cover some basics: how to use vmstat, iostat, and top to understand what part of your system is the bottleneck -- not just for Cassandra but for any system.

vmstat
You will typically run vmstat with "vmstat sampling-period", e.g., "vmstat 5." The output looks like this:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
20  0 195540  32772   6952 576752    0    0    11    12   38   43  1  0 99  0
22  2 195536  35988   6680 575132    6    0  2952    14  959 16375 72 21  4  3
The first line is your total system average since boot; typically this will not be very useful, since you are interested in what is causing problems NOW. Then you will get one line per sample period; most of the output is self explanatory. The reason to start with vmstat is the "swap" section: si and so are swap in (memory read from disk) and swap out (memory written to disk). Remember that a little swapping is normal, particularly during application startup: by default, Linux will swap infrequently used pages of application memory to disk to free up more room for disk caching, even if there is enough ram to accommodate all applications.

iostat
To get more details of io, use iostat -x. Again, you want to give it a sampling interval, and ignore the first set of output. iostat also gives you some cpu information but top does that better; let's focus on the Device section:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               9.80     0.20   36.60    0.40  5326.40     4.80   144.09     0.06    1.62   1.41   5.20
There are 3 easy ways to tell if a disk is a probable bottleneck here, and none of them show up without the -x flag, so get in the habit of using that. "avgqu-sz" is the size of the io request queue; if it is large, there are lots of requests waiting in line. "await" is how long (in ms) the average request took to be satisfied (including time enqueued); recall that on non-SSDs, a single seek is between 5 and 10ms. Finally, "%util" is Linux's guess at how fully saturated the device is.

top
To learn more about per-process CPU and memory usage, use "top." I won't paste top output here because everyone is so familiar with it, but I will mention a few useful things to know:

  • "P" and "M" toggle between sorting by cpu usage and sorting by memory usage
  • "1" toggles breaking down the CPU summary by CPU core
  • SHR (shared memory) is included in RES (resident memory)
  • Amount of memory belonging to a process that has been swapped out is VIRT - RES
  • a state (S column) of D means the process (or thread, see below) is waiting for disk or network i/o
  • "steal" is how much CPU the hypervisor is giving to another VM in a virtual environment; as virtual provisioning becomes more common, avoiding noisy neighbors is increasingly important

"top -H" will split out individual threads into their own lines; both per-process and per-thread views are useful. The per-thread view is particularly useful when dealing with Java applications since you can easily correlate them with thread names from the JVM to see which threads are consuming your CPU. Briefly, you take the PID (thread ID) from top, convert it to hex -- e.g., "python -c 'print hex(12345)'" -- and match it with the corresponding thread ID from jstack.

Now you can troubleshoot with a process like: "Am I swapping? If so, what processes are using all the memory? If my application makes a lot of disk read requests, are my reads being cached or are they actually hitting the disk? If I am hitting the disk, is it saturated? How much 'hot data' can I have before I run out of cache room? Are any/all of my cpu cores maxed? Which threads are actually using the CPU? Which threads spend most of their time waiting for i/o?" Then if you go to ask for help tuning something, you can show that you've done your homework.

Comments

Ross Bates said…
If you are on debian sudo apt-get install htop - much better than top.
Anonymous said…
I also like dstat as a better vmstat and iostat.
Unknown said…
You could use "sar" also to get a bunch of information
Anonymous said…
On my system (Ubuntu 9.04 Jaunty) after running 'top' you switch to sorting via memory consumption using 'M' not 'm'. Lower case 'm' will toggle part of the upper display re total system memory (doesn't affect the lower process based columns and rows at all). Also 'c' will issue an 'unknown command' error ... you use 'P' to go back to sorting PIDs via CPU% (short for 'Process' maybe?).
Jonathan Ellis said…
Thanks for the corrections, Anonymous. I've updated the article to reflect the correct commands.
Anonymous said…
oops, that was "C" (upper case 'C') that's an unknown command: lower case 'c' will expand/contract the process 'command' to the full path; so "firefox" will toggle to "/usr/lib/firefox-3.0.1"

Which is spiffy since I had no idea that it would do that till just now : )
Marius Gedminas said…
Good article! I remember it took me quite a while to learn how to understand vmstat.

Nitpickery: VIRT also includes things like file-backed memory (e. g. code from the executable and shared libraries that hasn't been demand-loaded), so VIRT - RES is not precisely equal to the amount of swap used for the app. Things like the X server also include video memory and register space in their VIRT, and often seem to eat much more memory than they really do.
James said…
iotop will show just how much IO each thread is doing.
Pavel Odintsov said…
hi, try use atop!
Sumant said…
Excellent post!


Recently I just came across a good article on "100 Linux Tips and Tricks"
Here is its link.
Ersin Er said…
replaca top with htop.

Popular posts from this blog

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren