Debugging the mouse driver

We now have an almost perfectly usable mouse driver. If you were to actually try and use it however you would eventually find a couple of problems with it. A few programs will also not work with as it does not yet support asynchronous I/O.

First let us look at the bugs. The most obvious one isn't really a driver bug but a failure to consider the consequences. Imagine you bumped the mouse hard by accident and sent it skittering across the desk. The mouse interrupt routine will add up all that movement and report it in steps of 127 until it has reported all of it. Clearly there is a point beyond which mouse movement isn't worth reporting. We need to add this as a limit to the interrupt handler:

static void ourmouse_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
        char delta_x;
        char delta_y;
        unsigned char new_buttons;

        delta_x = inb(OURMOUSE_BASE);
        delta_y = inb(OURMOUSE_BASE+1);
        new_buttons = inb(OURMOUSE_BASE+2);

        if(delta_x || delta_y || new_buttons != mouse_buttons)
        {
                /* Something happened */

                spin_lock(&mouse_lock);
                mouse_event = 1;
                mouse_dx += delta_x;
                mouse_dy += delta_y;

                if(mouse_dx < -4096)
                        mouse_dx = -4096;
                if(mouse_dx > 4096)
                        mouse_dx = 4096;

                if(mouse_dy < -4096)
                        mouse_dy = -4096;
                if(mouse_dy > 4096)
                        mouse_dy = 4096;

                mouse_buttons = new_buttons;
                spin_unlock(&mouse_lock);
                
                wake_up_interruptible(&mouse_wait);
        }
}
  

By adding these checks we limit the range of accumulated movement to something sensible.

The second bug is a bit more subtle, and that is perhaps why this is such a common mistake. Remember, I said the waiting loop for the read handler had a bug in it. Think about what happens when we execute:

        while(!mouse_event)
        {
  

and an interrupt occurs at this point here. This causes a mouse movement and wakes up the queue.

                interruptible_sleep_on(&mouse_wait);
  

Now we sleep on the queue. We missed the wake up and the application will not see an event until the next mouse event occurs. This will lead to just the odd instance when a mouse button gets delayed. The consequences to the user will probably be almost undetectable with a mouse driver. With other drivers this bug could be a lot more severe.

There are two ways to solve this. The first is to disable interrupts during the testing and the sleep. This works because when a task sleeps it ceases to disable interrupts, and when it resumes it disables them again. Our code thus becomes:

        save_flags(flags);
        cli();

        while(!mouse_event)
        {
                if(file->f_flags&O_NDELAY)
                {
                        restore_flags(flags);
                        return -EAGAIN;
                }
                interruptible_sleep_on(&mouse_wait);
                if(signal_pending(current))
                {
                        restore_flags(flags);
                        return -ERESTARTSYS;
                }
        }
        restore_flags(flags);
  

This is the sledgehammer approach. It works but it means we spend a lot more time turning interrupts on and off. It also affects interrupts globally and has bad properties on multiprocessor machines where turning interrupts off globally is not a simple operation, but instead involves kicking each processor, waiting for them to disable interrupts and reply.

The real problem is the race between the event testing and the sleeping. We can avoid that by using the scheduling functions more directly. Indeed this is the way they generally should be used for an interrupt.

        struct wait_queue wait = { current, NULL };

        add_wait_queue(&mouse_wait, &wait);
        set_current_state(TASK_INTERRUPTIBLE);
        
        while(!mouse_event)
        {
                if(file->f_flags&O_NDELAY)
                {
                        remove_wait_queue(&mouse_wait, &wait);
                        set_current_state(TASK_RUNNING);
                        return -EWOULDBLOCK;
                }
                if(signal_pending(current))
                {
                        remove_wait_queue(&mouse_wait, &wait);
                        current->state = TASK_RUNNING;
                        return -ERESTARTSYS;
                }
                schedule();
                set_current_state(TASK_INTERRUPTIBLE);
        }
        
        remove_wait_wait(&mouse_wait, &wait);
        set_current_state(TASK_RUNNING);
  

At first sight this probably looks like deep magic. To understand how this works you need to understand how scheduling and events work on Linux. Having a good grasp of this is one of the keys to writing clean efficient device drivers.

add_wait_queue does what its name suggests. It adds an entry to the mouse_wait list. The entry in this case is the entry for our current process (current is the current task pointer).

So we start by adding an entry for ourself onto the mouse_wait list. This does not put us to sleep however. We are merely tagged onto the list.

Next we set our status to TASK_INTERRUPTIBLE. Again this does not mean we are now asleep. This flag says what should happen next time the process sleeps. TASK_INTERRUPTIBLE says that the process should not be rescheduled. It will run from now until it sleeps and then will need to be woken up.

The wakeup_interruptible call in the interrupt handler can now be explained in more detail. This function is also very simple. It goes along the list of processes on the queue it is given and any that are marked as TASK_INTERRUPTIBLE it changes to TASK_RUNNING and tells the kernel that new processes are runnable.

Behind all the wrappers in the original code what is happening is this

  1. We add ourself to the mouse wait queue

  2. We mark ourself as sleeping

  3. We ask the kernel to schedule tasks again

  4. The kernel sees we are asleep and schedules someone else.

  5. The mouse interrupt sets our state to TASK_RUNNING and makes a note that the kernel should reschedule tasks

  6. The kernel sees we are running again and continues our execution

This is why the apparent magic works. Because we mark ourself as TASK_INTERRUPTIBLE and as we add ourselves to the queue before we check if there are events pending, the race condition is removed.

Now if an interrupt occurs after we check the queue status and before we call the schedule function in order to sleep, things work out. Instead of missing an event, we are set back to TASK_RUNNING by the mouse interrupt. We still call schedule but it will continue running our task. We go back around the loop and this time there may be an event.

There will not always be an event. Thus we set ourselves back to TASK_INTERRUPTIBLE before resuming the loop. Another process doing a read may already have cleared the event flag, and if so we will need to go back to sleep again. Eventually we will get our event and escape.

Finally when we exit the loop we remove ourselves from the mouse_wait queue as we are no longer interested in mouse events, and we set ourself back to TASK_RUNNABLE as we do not wish to go to sleep again just yet.

NoteNote
 

This isn't an easy topic. Don't be afraid to reread the description a few times and also look at other device drivers to see how it works. Finally if you can't grasp it just yet, you can use the code as boilerplate to write other drivers and trust me instead.