Articles/Essays/Papers Log: Spinlocks on x86

Spinlocks are the most basic form of synchronization. They are mainly useful on SMP kernels to protect different CPUs from each other. Almost never want to use them in user-space. Most of the time you should also disable interrupts before taking a spinlock, and only enabled then once you've released the lock.

When we want to protect some data structure from concurrent access by multiple CPUs we take a lock. When we are done, we release it. If we try to take a lock that is held by another CPU, we busy-wait until that CPU releases the lock.

To take the lock safely we use atomic instructions provided by the CPU to atomically update a memory space. Atomic instructions are guaranteed to be seen as single operations by all other CPUs. The atomic instructions involve utilizing a lock prefix on the instruction and having the destination operand assigned to a memory address. The following instructions can run atomically with a lock prefix on current x86 processors: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. On most instructions a lock prefix must be explicitly used except for the xchg and cmpxchg instruction where the lock prefix is implied if the instruction involves a memory address.

In the days of Intel 486 processors, the lock prefix used to assert a lock on the bus along with a large hit in performance. Starting with the Intel Pentium Pro architecture, the bus lock is transformed into a cache lock. A lock will still be asserted on the bus in the most modern architectures if the lock resides in uncacheable memory or if the lock extends beyond a cache line boundary splitting cache lines. Both of these scenarios are unlikely, so most lock prefixes will be transformed into a cache lock which is much less expensive.

1. A straight-forward spinlock

C source

typedef struct {
volatile unsigned long lock;
} spinlock_t;

#define spinlock_init(x) do { *(x) = (spinlock_t){ 1 }; } while(0)

/*
* Busy-wait until the lock can be taken by setting to 0.
*/
void spinlock_lock(spinlock_t *lock)
{
unsigned long dummy = 0;
asm volatile("1:"                /* label to which we loop */
"  xchg %0, %1;"    /* xchg doesn't need lock prefix */
"  cmp  $0, %1;"    /* check if it was locked already */
"  je 1b;"          /* if so, try again */
: "=m" (lock->lock) /* output in memory */
: "r" (dummy)       /* input in some register */
: "memory");        /* memory barrier */
}

void spinlock_unlock(spinlock_t *lock)
{
lock->lock = 1;
}

x86_64 assembly

.globl spinlock_lock
.type   spinlock_lock, @function
spinlock_lock:
pushq   %rbp
movq    %rsp, %rbp
movq    %rdi, -24(%rbp)
movq    $0, -8(%rbp)
movq    -24(%rbp), %rdx
movq    -8(%rbp), %rax
#APP
1:  xchg (%rdx), %rax;  cmp  $0, %rax;  je 1b;
#NO_APP
leave
ret

.globl spinlock_unlock
.type   spinlock_unlock, @function
spinlock_unlock:
pushq   %rbp
movq    %rsp, %rbp
movq    %rdi, -8(%rbp)
movq    -8(%rbp), %rax
movq    $1, (%rax)
leave
ret

Tuesday, June 10, 2008

Spinlocks on x86