Disk I/O Monitoring
Linux Windows AIX Solaris
Linux
Both 'sar' and 'iostat' are available on Linux to monitor disk activities:
# sar -d 1 5
Linux 2.4.21-27.ELsmp (pw101) 12/04/2005
06:33:54 PM DEV tps rd_sec/s wr_sec/s
06:33:55 PM dev8-0 0.00 0.00 0.00
06:33:55 PM dev8-16 0.00 0.00 0.00
06:33:55 PM dev8-32 0.00 0.00 0.00
06:33:55 PM dev8-48 0.00 0.00 0.00
06:33:55 PM dev8-64 0.00 0.00 0.00
06:33:55 PM dev8-65 0.00 0.00 0.00
06:33:55 PM dev8-66 0.00 0.00 0.00
# iostat 1 3
Linux 2.4.21-27.ELsmp (pw101) 12/04/2005
avg-cpu: %user %nice %sys %iowait %idle
7.14 0.00 4.53 0.01 88.32
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.23 1066 2377572
sdb 0.01 0.39 0.00 3941054 118
sdc 0.00 0.23 0.00 2364386 16
sdd 0.00 0.00 0.00 24 0
sde 0.35 0.09 6.88 933722 70340208
sde1 0.35 0.09 6.88 933218 70340208
sde2 0.00 0.00 0.00 280 0
If '%idle' goes below 20%, the system maybe queuing up disk I/Os and response time suffers.
'1 5' and '1 3' parameters are intervals and iterations (e.g., 1-second interval 5 iterations). |
Windows
'Perfmon' is a Windows tool that can be configured to monitor (and log to
file) system resources. Physical disk counters are one area that can be
added to the overall resources to be monitored. You can start 'perfmon'
simply by typing 'perfmon' in the "Start->Run..." window:
Once you created a counter log, for example, named 'perf', you can add counters to it.
Double-click 'perf' to open it, and click on "Add Counters", and select
"PhysicalDisk" in the 'Performance object' dropdown list. Then select the counters needed.
|
A couple of indicators must be monitored for hard disks in your system. Watch
the Physical Disk (instance)\Disk Transfers/sec counter for each physical
disk and if it goes above 25 disk I/Os per second then you've got poor response
time for your disk. A bottleneck from a disk can significantly impact response
time for applications running on your system, so you should investigate this
further by tracking Physical Disk(instance)\% Idle Time, which
measures the percent time that your hard disk is idle during the measurement
interval, and if you see this counter
fall below 20% then you've likely got read/write requests queuing up for
your disk which is unable to service these requests in a timely fashion. In this
case it's time to upgrade your hardware to use faster disks or scale out your
application to better handle the load.
Then you will need to start the perf counter log by click on the play button
(the icon like the one your VCR). This starts the monitoring of the
counters you added and logging. You may at a later time, stop it, and play
it back as shown below.
AIX
'sar -d' sends a snapshot of disk activities to STDOUT. You can
supply the interval and iteration parameters for the command to repeat:
# sar -d 1 1
AIX cmcs101 3 5 002FBF7D4C00 01/04/06
System configuration: lcpu=8 drives=22
14:25:18 device %busy avque r+w/s Kbs/s avwait avserv
14:25:19 hdisk6 0 0.0 0 0 0.0 0.0
hdisk7 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
hdisk8 0 0.0 0 0 0.0 0.0
hdisk11 0 0.0 0 0 0.0 0.0
hdisk9 0 0.0 0 0 0.0 0.0
hdisk10 0 0.0 0 0 0.0 0.0
hdisk12 0 0.0 0 0 0.0 0.0
hdisk13 0 0.0 0 0 0.0 0.0
hdisk14 0 0.0 0 0 0.0 0.0
hdisk0 0 0.0 0 0 0.0 0.0
hdisk15 0 0.0 0 0 0.0 0.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk1 0 0.0 0 0 0.0 0.0
dac0 0 0.0 0 0 0.0 0.0
dac0-utm 0 0.0 0 0 0.0 0.0
dac1 0 0.0 0 0 0.0 0.0
dac1-utm 0 0.0 0 0 0.0 0.0
hdisk16 0 0.0 0 0 0.0 0.0
hdisk17 0 0.0 0 0 0.0 0.0
** this example shows no disk activities.
|
'iostat aux' outputs similar information on disk I/O:
# iostat aux 1 1
System configuration: lcpu=8 drives=22
18:34:05 device %busy avque r+w/s Kbs/s avwait avserv
18:34:06 hdisk6 0 0.0 0 0 0.0 0.0
hdisk7 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
hdisk8 0 0.0 0 0 0.0 0.0
hdisk11 0 0.0 0 0 0.0 0.0
hdisk9 0 0.0 0 0 0.0 0.0
hdisk10 0 0.0 0 0 0.0 0.0
hdisk12 0 0.0 0 0 0.0 0.0
hdisk13 0 0.0 0 0 0.0 0.0
hdisk14 0 0.0 0 0 0.0 0.0
hdisk0 0 0.0 0 0 0.0 0.0
hdisk15 0 0.0 0 0 0.0 0.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk1 0 0.0 0 0 0.0 0.0
dac0 0 0.0 0 0 0.0 0.0
dac0-utm 0 0.0 0 0 0.0 0.0
dac1 0 0.0 0 0 0.0 0.0
dac1-utm 0 0.0 0 0 0.0 0.0
hdisk16 0 0.0 0 0 0.0 0.0
hdisk17 0 0.0 0 0 0.0 0.0
parameters '1 1' indicate interval and iteration for refreshes.
|
To measure disk Read throughput:
# time dd if=/tmp/f66mb.out of=/dev/null bs=1024k
63+1 records in.
63+1 records out.
real 0m2.04s
user 0m0.00s
sys 0m1.05s
The 'time' command shows the amount of time it took to complete the read. The read throughput
in this example is about 33MB per second (66MB / 2.04 seconds real time).
|
To measure disk Write throughput:
# sync; date; dd if=/dev/zero of=/tmp/1000m bs=1024k count=1000; date; sync; date
Thu Jan 5 23:02:41 PST 2006
1000+0 records in.
1000+0 records out.
Thu Jan 5 23:02:59 PST 2006
Thu Jan 5 23:02:59 PST 2006
In this example, dd completed after 18 seconds (23:02:59 - 23:02:41) and wrote with 55MB
per second (1GB / 18 seconds).
|
To pin-point which file system, logical volume, or physical hdisk is the most active under a heavy workload, use filemon. See details here.
Solaris
The 'iostat –xc' command can be used to view disk activities on a Solaris
machine. The –x argument allows extended statistics to be reported and the –c
option reports the CPU utilization for the interval.
# iostat -xc
extended device statistics cpu
device r/s w/s kr/s kw/s wait actv svc_t %w %b us sy wt id
fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 8 64 27 0
sd0 51.1 0.2 6545.1 1.6 0.0 1.8 34.7 0 100
sd1 84.7 0.0 10615.1 0.0 0.0 1.6 19.0 1 98
sd4 27.6 6.8 220.5 51.6 0.0 2.9 83.0 0 98
sd6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
Looks like disk 'sd0' is really busy (100% busy!). Next step is to find out what is using it.
The fields have the following meanings:
disk name of the disk
r/s reads per second
w/s writes per second
Kr/s kilobytes read per second
Kw/s kilobytes written per second
wait average number of transactions waiting for ser-
vice (queue length)
actv average number of transactions actively being
serviced (removed from the queue but not yet
completed)
%w percent of time there are transactions waiting
for service (queue non-empty)
%b percent of time the disk is busy (transactions
in progress)
|
|