hdc: lost interrupt

When upgrading to Linux 2.4.17 on a rather ancient and oddly configured 486, I got the following message at boot time when the kernel probed for IDE devices:

hdc: lost interrupt

This appeared a number of times, and the whole booting process was incredibly slow as a result. It didn't occur on the 2.2 kernel image (which had actually been configured by a previous operator of the same machine).

Googling for "lost interrupt" wasn't very informative. Looking at the kernel source revealed only that this message appears when the kernel gets bored waiting for an interrupt from IDE controller (which takes some time, hence the long delays). Why weren't the interrupts getting through under 2.4 when they were fine in 2.2?

It turns out, through looking at /proc/interrupts that this machine's second IDE bus was using IRQ 14 instead of (as conventional) IRQ 15:

curator:~# cat /proc/interrupts 
           CPU0       
  0:     404273          XT-PIC  timer
  1:       1213          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  5:      21515          XT-PIC  NE2000
 10:      18723          XT-PIC  NE2000
 14:      32643          XT-PIC  ide1
NMI:          0 
ERR:          0

/proc/ioports revealed that it used the conventional I/O port addresses for IDE1 though (i.e. 0170-0177 and 0376):

curator:~# cat /proc/ioports 
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
02f8-02ff : serial(auto)
0300-031f : eth0
0340-035f : eth1
0376-0376 : ide1
03c0-03df : vga+
03f8-03ff : serial(auto)

The fix in this case was to add "ide1=0x170,0x376,14" to the kernel command line (e.g. via Lilo's "append" option). Presumably the previously installed 2.2 kernel had been bodged in some other way, or used a different method of probing for discs.

This web page exists merely to produce a hit in search engines, so that anyone else who is as confused as I was by this error has a hint as to one possible cause. Note that the problem described here might not be the only reason you get this message!

Kernel 2.4.20

I had to patch kernel 2.4.20 to make the above option work. rjk-ide-fix.diff. (Clarification: the patch won't make the 'lost interrupt' problem go away of itself; it just enables the ideN= command-line options which are broken in 2.4.20.)

I don't understand! What's an interrupt?

There's a lot of simplication in this explanation, but hopefuly not in the important places:

An interrupt is a signal from one part of your computer to another. Hard discs send an interrupt to the kernel to indicate that whatever operation they were performing has completed.

Interrupts have numbers: on a normal PC, interrupt number 14 is used by the first IDE bus (/dev/hda and /dev/hdb) and interrupt 15 by the second IDE bus (/dev/hdc and /dev/hdd). So when the kernel receives interrupt 14 it only checks the first IDE bus; when it receives interrupt 15 it only checks the second.

This means that if the kernel's idea of what interrupts mean is different from what they really mean, the kernel might never find out that operations on one or more of the disks completed. When this happens, eventually the kernel decides that the operation has taken too long, and, depending on the state of the drive, pretends it received the interrupt anyway (and prints the "lost interrupt" message), retries the operation, or issues an error message.

But, the timeout is long enough that this all results in your computer running really slowly, as it's taking several seconds over every single hard disk operation.

The cure of course is to tell the kernel which interrupt corresponds to which IDE bus (i.e. which sets of disks); and that's what the ideN= option described above does.

RJK | Contents