IRQTUNE -- A Linux IRQ Priority Optimizer


IRQTUNE -- A Linux IRQ Priority Optimizer

Copyright 1996, 1997 by Craig Estey.
irqtune version 0.6
Last updated: Sun Oct 19 16:35:00 PDT 1997
See the Changes section at the bottom of this document.
Table of Contents

irqtune changes the IRQ priority of devices to allow devices that require high priority and fast service (e.g. serial ports, modems) to have it.

With irqtune, a 3X speedup of serial/modem throughput is possible.


Where do I get irqtune?

irqtune is free software under the terms and conditions of the GNU Public License. See the file COPYING, included in the distribution, for details.


How do I know if I need irqtune?

You are running Linux on an x86 PC, other architectures to be implemented later--Sorry.

You probably need irqtune, if you are experiencing any of the following:


What is actually happening to cause these problems?

When the PC boots Linux, the timer is given, by default, the highest IRQ priority in the system (it's IRQ 0 and thus, priority 0). On a standard configuration, the serial ports are priority 11 and 12!!! This means that 10 other devices have higher priority.

Q: So what does IRQ priority do?

When multiple devices are in contention to interrupt the CPU, their priority decides which interrupts will occur in what order.

Q: When does this contention occur?

After an arbitrary period of having interrupts disabled (e.g after a cli), at the point where they're reenabled (sti). This can happen in several places:

Q: If there are multiple interrupts now pending, which one gets the service, the serial or some other?

In the default configuration, the serial ISR will usually lose as it's priority 11.


How does irqtune help this?

irqtune gives priority 0 to whatever device we specify. If we specify a serial device, irqtune guarantees that the serial ISR gets control whenever a contention occurs.


Why does the serial interrupt service require the highest priority?

Q: Why does the serial device merit such special treatment?

Serial devices are somewhat unique. Even though they have one of the slowest data rates (relative to a disk), they are the largest consumer of interrupts and are extremely sensitive to interrupt latency (the time from when a device yanks the IRQ line until its ISR is executed).

Q: Could you give a concrete example of this?

Q: In this example, how would boosting serial IRQ priority help?


Isn't that example very unlikely under Linux?

Unlikely doesn't mean never. There are places were the contention period must occur, no matter how we program the CPU.

Variations in CPU speed, RAM size, RAM speed, cache size, disk speed, disk rotational position, number and type of other devices, system workload, etc, etc etc, all contribute to variations of order and timing of internal OS events.

Unlikely on one system may mean 1/1000th % chance. On another system, it may mean happens 50 times/second. Beyond a certain point, it all comes down to measurement.

Tight, accurate, repeatable measurement is the key to system tuning. If we can't measure it, we can't tune it. Once we can measure these things, we can then try various combinations until we achieve our desired results. It can often be pointless to guess at performance.

In fact, the interrupt disable windows themselves are used to prevent data corruption caused by various parts of the kernel that try to update a data structure simultaneously and get corrupted. Many of these windows are put there in the unlikely case that such corruption would occur. In this context, unlikely is never considered a reason to forego interrupt locking. Eliminating a necessary interrupt lock window could result in kernel panics, RAM corruption, disk data corruption, etc.


Doesn't this hurt the performance of other devices?

Not really.

In actual practice, most devices don't even notice the difference. Most other devices (e.g. disks, tape, ethernet) are DMA devices. The DMA does most of the work, thus greatly reducing their need for interrupts. If the device allows a request queue, it may function autonomously on several requests, producing only one interrupt for the entire batch.

Furthermore, serial interrupt services are, themselves, very fast. They slam their data as quickly as possible and get out ASAP. No fancy calculations, just the minimum, mindless data transfer. Almost everything else is handled later, in the bottom-half with interrupts enabled. In fact, a serial ISR may have to re-interrupt it's own bottom-half several times.

Those devices that do experience some slight slowdown are more likely to have long interrupt disable windows themselves. Having several smaller cli/sti windows is much better than one large cli/sti window--It's just harder to program.

Q: But suppose I want a balanced priority system?

Well, actually a prioritized system behaves like a balanced system--most of the time. This occurs when all devices have short interrupt lockout windows and short ISR execution times. The priority mechanism is like a safety valve--it only really matters when some device or combination of devices has held interrupts locked for an extended period of time.

Q: But doesn't that seem unfair to other devices?

When a disk ISR gets delayed, that's all that happens, a slight delay--disks and tapes are not real-time devices. When a serial ISR gets delayed, data is destroyed--serial devices are real-time devices that dictate the cadence of the entire system.

It may sound cruel, but you just can't be fair.

And speaking of sound, if the sound card gets delayed, we hear an annoying pop right in the middle of our favorite piece of music.

These are real-time devices that can not tolerate excessive delays. If we must defer some less time critical devices to meet the minimum real-time criteria of devices, then that's a bargain.

Q: But suppose I really want both fast serial and fast disk?

Ultimately, it's a bit of a compromise. Which is better:

When paying an ISP for Internet access in $$$/hour, it's an easy decision :-).


Isn't this IRQ priority thing a bit of a new idea?

No. It's actually an old idea. I've been doing device drivers since 1977 and Unix kernel work since 1981. I've personally written 8 serial drivers have used this many times commercially. Giving the serial device the highest priority is actually standard practice in many systems. With a 4Mhz CPU, these problems used to occur at 1200 baud :-)


How do I install irqtune?

Q: Where should I place irqtune files?

Q: How do I unpack the archive?

Note: If tar's z option has problems:

Q: How do I do a simple installation?

Q: Why are .o files being placed in the /sbin directory?

As a note to purists, the installation directory is completely arbitrary. /sbin is short, sweet, and easy to type.

The standard convention would be to install irqtune as /sbin/irqtune and the .o files as, say, /usr/lib/irqtune/*.o. irqtune uses a slightly different convention. It wants all installed files to be in the same directory. This works fine because argv[0] will point to the .o files. This also allows:

Q: What if I really don't want .o files in /sbin?

If placing .o files in /sbin is deemed to be an anathema, we have two options:

Q: Are there any special considerations for this alternate installation method?

Yes. If the shell/stub installation method is used:


How do I use irqtune? Don't I have to rebuild my kernel?

No, we do not have to rebuild the kernel. irqtune uses insmod and rmmod to dynamically load and unload a kernel module. But it is correct to sense that irqtune is a kernel patch.

Q: Ok, if it's a kernel patch, why not just issue a kernel patch like everybody else does (e.g. diff -u output)?

irqtune will work even if we don't have the kernel source loaded. It uses insmod to load the patch, invoke it, and then unload it. The IRQ priority changes will last so long as the kernel is booted.

Q: How do we invoke it?

irqtune takes two arguments optional arguments:

The default is 3 14 which will work for many standard configurations. See What about my non-standard hardware configuration? for details.

Q: Could we do this from my /etc/rc.d/rc.local file?

Yes. Just add a /sbin/irqtune line to this file and we're in business. We may also issue another irqtune command at any time.

Q: What if irqtune fails to load?

See the What about irqtune load failures or incompatibilities with kernel revisions? section.

Q: After irqtune sets a priority, how can we query the results later?

We can't.

Due to a limitation of the PC interrupt hardware, it is not possible to read back values set previously.

irqtune will attempt to place a message in the system log (/usr/spool/syslog/syslog) when it changes the configuration. Examining this log file, or simply invoking irqtune again, may be the best ways to work around the hardware limitations.

Note: Some users have tried to use the "-n" option, thinking it will act as a "query" mode. This option is used, primarily, to generate examples in this document and will not have the desired effect.


What about my non-standard hardware configuration?

irqtune defaults for a standard IRQ configuration. It assumes that the highest priority device should be on IRQ 3. This is normally the first serial port on standard configurations, which is what you want.

Q: How do I determine what my IRQ configuration is?

NOTE: For brevity, we've combined the non-sorted and sorted output in these examples.

NOTE: /proc/interrupts, and therefore irqtune, only reports on active devices. So to scope out the serial IRQ's, ideally, you'd have X Windows up with your serial mouse and be connected via PPP to the net.

Q: OK, we've got the output from /sbin/irqtune -n -o, what do we do with it?

The leftmost number is the IRQ number. The next number is the priority. The rightmost column is the internal device name (not to be confused with /dev names). In the above case, the two serial ports are on IRQ 3 and IRQ 4. Just use the lower number, in this case 3:

This sets IRQ 3 to the highest priority.

Q: BTW, What's the cascade device I saw in the output of irqtune?

Glad you asked. There are actually two interrupt controllers, a master and a slave. The slave is cascaded to the master via its IRQ 2. The master controls IRQ's 0-7 and the slave controls IRQ's 8-15.

You actually may select two high IRQ priorities, one for the master and one for the slave. irqtune defaults the slave to IRQ 14, which is normally the disk controller.

In fact, cascade is sort of a "zero width" device as it does not contribute to interrupt latency. Setting the cascade to top priority on the master has an interesting effect which we'll see shortly.

Q: But we've also got an Ethernet controller on IRQ 12. What about that?

In this case, we might want to use:

because we want our ethernet card to have a higher priority than the disk controller. Actually if we did have this configuration, setting 3 14 (the default) would make the ethernet card, the lowest priority device in the system.

Q: What about the serial multiplexer card on IRQ 11?

This is a bit tricky because now we've got a serial device on the slave controller. It would be much better to put all serial cards on the master controller. Things would stay much simpler.

In this case we would want to use:

Q: Wait a minute. didn't we just specify a slave IRQ number as the master to irqtune?

Yes, this is shorthand way of saying 2 11. You can make a slave device top priority, but we get no options for the master IRQ. It will always be 2, the cascade device. Remember, the cascade device contributes no latency delay by itself.

Q: So why is this configuration so bad?

Well, we boosted the priority of the serial multiplexer at the expense of the regular serial ports. The only way to allow all serial ports equally high priority is to group them on consecutive IRQ's and set the high priority for the lowest of those IRQ's.

Q: How can we fix this with software?

We can't.

We're limited by the architecture of the PC and its interrupt controllers. We must change the IRQ of a device by physical restrapping--we can't do it by reprogramming the priority alone.

We'll go back to the earlier example. We'll wave a magic wand and Poof!--assume we just restrapped the serial multiplexer to IRQ 5:

Q: Could we do more of this restrapping, say, with the ethernet controller?

Sure. Waving the wand again, we restrap the ethernet card to IRQ 6.

So in order to get the best overall system, we may need to change IRQ priority and physically change the hardware IRQ configuration?

Exactly.

Different systems may have highly different criteria for what is optimum. It is, ultimately, a choice that each system administrator must make based upon the specific requirements for the particular system in question. We can only provide tools to do the job, but the final choice is ultimately decided on a case-by-case basis. There is no one size fits all solution.


What about irqtune load failures or incompatibilities with kernel revisions?

Q: What's the bottom line?

irqtune makes every attempt to load its kernel module. irqtune probes the kernel, /proc/ksyms, and insmod. It attempts to detect, correct, or work around any difficulties. If the problems are truly severe, irqtune will report this also. Normally, irqtune probes silently, only reporting the results of the local system configuration if there's an non-recoverable error.

No matter what happens with probes or loads, irqtune will report the final completion status as its last line: complete or error.

Q: What should be checked first?

Q: What if the current kernel revision is different from the kernel revision in the prebuilt modules?

This does not matter. irqtune is 99.44% kernel revision independent. It is almost never necessary to rebuild the prebuilt modules. If irqtune fails to load the modules, consider everything in this section carefully before rebuilding irqtune.

Notes to programmers:

Rebuilding the binaries before exploring the other options is a lot like Vonnegut's OFF switch--it's comforting but not connected to anything :-)

Q: Are there any kernel revisions that irqtune won't work with?

When irqtune was first released, some experimental changes were made to the kernel to solve the IRQ priority problem by use of a round-robin, balanced priority system that was incompatible with irqtune. These changes were ultimately removed but irqtune will not work with kernel revisions 2.0.15 to 2.0.18. irqtune will detect this condition and report an error. See the Where can I find additional documentation or downloads? section.

Q: Are there any insmod revisions that are incompatible?

irqtune first trys to load irqtune_mod.o and falls back to irqtune_npr.o if it detects a load error in insmod.

Some versions of insmod have severe difficulty loading modules when the kernel is using MODVERSIONS. There is a known bug in insmod:

Loading irqtune_mod.o will crash insmod, partially lock up irqtune's module. irqtune will detect this and skip the irqtune_mod.o loading entirely.

Generally, we should not use an older insmod with a newer kernel (e.g. using a 2.0.X insmod on a 2.1.X kernel). irqtune will detect and report this.

If insmod still has difficulties, we may want to upgrade it to 2.1.34 (or better). Newer versions of insmod are guaranteed to be backward compatible to older kernels. This will increase the probability that the irqtune_mod.o will load and irqtune will not have to fallback to irqtune_npr.o. Note: insmod is actually part of the modutils package. modutils under 2.0.X is called modules. See the Where can I find additional documentation or downloads? section.

Q: What is the difference between irqtune_mod.o and irqtune_npr.o?

The only kernel symbol that irqtune's kernel module (irqtune_mod.o) uses is printk (to print a confirmation message to syslog). The irqtune_npr.o module is exactly the same as irqtune_mod.o except that it does not use printk. Since irqtune pre-checks all parameters before attempting to load the kernel module, the confirmation message is a nicety but not a necessity.

Q: What if we don't have ELF binary support?

Well, we should upgrade the kernel as ELF binaries are cool :-). But if that's not possible, we'll just have to recompile irqtune to create a.out binaries. This is, perhaps, the only justification for rebuilding irqtune. Just be sure that /usr/src/linux/include is installed. The exact procedure for building a.out binaries can vary with compiler revision, so it's important to check the documentation on this (a parameter or two may need to be added).

Q: What about IRQ sharing of serial ports?

Under the 2.0.X (and later) kernels, use of IRQ sharing will defeat IRQ priority because the serial port ISR's are installed as slow rather than fast interrupts (e.g. they don't use the SA_INTERRUPT flag).

Under earlier kernels this is not a problem because the serial ISR was always installed with SA_INTERRUPT.


What about hardware/config problems?

Q: What if the serial port doesn't work?

Q: What if PPP doesn't work?

See the PPP man page and PPP-Howto for best information, but some recommended options:

AT-like modems have a special character to escape from data mode to command mode. To avoid confusion here, we'll call this modem escape character a guard character.

The default guard character is '+' (decimal 43, hex 2B). Normally, 3 such characters are required within a special timing sequence.

Although it is unlikely, it is still possible that some PPP packets could generate the guard sequence inadvertantly. To prevent this, we may want to inhibit the generation of the guard character in a data sequence. To do this, we would add the additional PPP option:

Since '+' is a common ASCII character (PPP escaped characters generate two characters), we may wish to use a less common value for the guard character. For example, a less common value might be (decimal 200, hex C8). We would add an ATS2=200 command to our modem dialer script and change the PPP escape option to C8.

Q: Why does PPP consistently drop every second packet sent from Linux, resulting in a 50% packet loss?

Some braindamaged PPP implementations do not handle PPP flag optimization!

The PPP protocol uses a flag byte to separate packets. Each packet begins with a flag and ends with a second flag.

Although the PPP protocol requires implementations receiving packets to handle flag optimization, some broken PPP implementations do not understand it!

These implementations see the trailing flag, process the packet, then look for a fresh flag. They don't realize that the trailing flag of PACKET1 may perform double duty as the leading flag of PACKET2. They will ignore all data until they see a new flag (which, in this example, is the flag between PACKET2 and PACKET3). Thus, PACKET2 will be seen as noise data and be ignored. These implementations will only see only the odd number packets (e.g. PACKET1, PACKET3, PACKET5, etc.), resulting in a 50% packet loss!

Linux PPP implements flag optimization correctly and enables it by default. As charity to others, Linux does allow flag optimization to be turned off, but currently, this this requires the kernel to be rebuilt.

In Linux, to turn off flag optimization on transmit, do the following:

Note: A better solution is to return the defective PPP implementation to the vendor and demand a refund or replacement!

Q: How can we be certain that the PPP flag optimzation loss is occuring?

By lowering the baud rate to something that is guaranteed not to drop data due to speed problems (e.g. 300 baud). If we get a consistent 50% loss at this low rate, this is almost certain proof of the flag optimization problem.


What other performance software remedies must be done?

Q: What about using hdparm -u to set the interrupt-unmask flag in the hard disk driver?

This is only necessary for the IDE driver. The SCSI driver has short disable windows by default. This will shorten the IDE interrupt disable windows.

Beware: Without this option, IDE disk activity will almost certainly cause serial data dropouts. If we have an IDE disk, this is mandatory.

Q: What about disabling Van Jacobsen header compression in PPP?

This reduces the amount of bottom-half processing the system has to do at the expense of larger packets being sent. This may be helpful on slower CPU's or heavily loaded configurations.

Q: What about adjusting the MRU/MTU numbers in PPP?

Reducing the MRU/MTU to a minimum (296) reduces the bottom-half processing and flip-buffer latency at the expense of adding extra overhead bytes due to the reduced packet size. The optimal value will vary from configuration to configuration.

Beware: Start with 296 as the optimal may not be 1500.

The flip-buffer is a double buffer mechanism in the serial/tty drivers through which all data must pass. It has a fixed size of only 512 bytes. MRU/MTU greater than the flip-buffer size may create an internal race condition that may cause dropouts on slower CPU's or heavily loaded configurations.

Q: What about going to newer kernel revisions?

Although irqtune will work surprisingly well with just about any kernel revision, the low level IRQ handlers and device drivers have been vastly improved in the 2.0.X kernels. This will only improve irqtune's effect.

Q: What about increasing the serial port baud rate?

The serial port baud rate should be high enough to support the maximum expected transfer rate--but no higher. Higher speed settings place extra strain on the CPU, increasing the likelihood of overruns.

For a 33.6 modem, the minimum baud rate would be 38400. However, with compression, the expected transfer rate can be as high as 6 KB/second. This would require a baud rate of 57600. This may strain the CPU, and since the transfer rate is nominally about 4 KB/second, a lower baud rate may be a good compromise.

The best way is to try several rates, then benchmark them to see which provides the best overall performance.

Note: Because of backward compatibility to older systems, we can't just set 57600 directly with stty, kermit, pppd, etc. Specify 38400 to these programs, and use the setserial program with an option of spd_hi. For ISDN speeds, use spd_vhi. Other options are possible so be sure to consult the manpage.


How can I tell if irqtune actually did anything for me?

Well, first off, if PPP/SLIP was dying mysteriously, it will probably be more reliable.

Q: How can we benchmark irqtune?

Run without it and get a feel for the transfer rate:

Repeat this using irqtune and note the transfer times again.

NOTE: IRQTUNE just won't quit--if you want to test in the original mode again, reboot the system first.

Q: What if we still don't see any real improvement?

It's a matter of probability. Performance measurement is as much art as science.


Where can I find additional documentation or downloads?


Changes

Revision 0.2 Changes:

Revision 0.3 Changes:

Revision 0.4 Changes:

Revision 0.5 Changes:

Revision 0.6 Changes:


Table of Contents