Tuesday, June 5, 2007

Semaphores and read/write Semaphores

Sometimes, while accessing a shared data structure, one must perform operations that can block, for example copy data to userspace. The locking primitive available for such scenarios under Linux is called a semaphore. There are two types of semaphores: basic and read-write semaphores. Depending on the initial value of the semaphore, they can be used for either mutual exclusion (initial value of 1) or to provide more sophisticated type of access.

Read-write semaphores differ from basic semaphores in the same way as read-write spinlocks differ from basic spinlocks: one can have multiple readers at a time but only one writer and there can be no readers while there are writers - i.e. the writer blocks all readers and new readers block while a writer is waiting.

Also, basic semaphores can be interruptible - just use the operations down/up_interruptible() instead of the plain down()/up() and check the value returned from down_interruptible(): it will be non zero if the operation was interrupted.

Using semaphores for mutual exclusion is ideal in situations where a critical code section may call by reference unknown functions registered by other subsystems/modules, i.e. the caller cannot know apriori whether the function blocks or not.

A simple example of semaphore usage is in kernel/sys.c, implementation of gethostname(2)/sethostname(2) system calls.

asmlinkage long sys_sethostname(char *name, int len)
{
int errno;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
down_write(&uts_sem);
errno = -EFAULT;
if (!copy_from_user(system_utsname.nodename, name, len)) {
system_utsname.nodename[len] = 0;
errno = 0;
}
up_write(&uts_sem);
return errno;
}

asmlinkage long sys_gethostname(char *name, int len)
{
int i, errno;

if (len < 0)
return -EINVAL;
down_read(&uts_sem);
i = 1 + strlen(system_utsname.nodename);
if (i > len)
i = len;
errno = 0;
if (copy_to_user(name, system_utsname.nodename, i))
errno = -EFAULT;
up_read(&uts_sem);
return errno;
}

The points to note about this example are:

1. The functions may block while copying data from/to userspace in copy_from_user()/copy_to_user(). Therefore they could not use any form of spinlock here.
2. The semaphore type chosen is read-write as opposed to basic because there may be lots of concurrent gethostname(2) requests which need not be mutually exclusive.

Although Linux implementation of semaphores and read-write semaphores is very sophisticated, there are possible scenarios one can think of which are not yet implemented, for example there is no concept of interruptible read-write semaphores. This is obviously because there are no real-world situations which require these exotic flavours of the primitives.

1 comment:

KSA said...

Nice information. Thanks.