Lock
Basic Concept
It is a traditional varaible change.
1 | lock_t mutex; |
Since lock is a varaible, it must be initialised to be used.
Lock has status:
- Available/Unlock/free: no thread has the lock.
- Acquired: Excatly one thread has the lock (owner) within the critical section.
Lock gives the least control at scheduling for programmer.
Generally, programmer creates the thread, and OS controls it. Lock gives partial control of thread to programmer.
Lock Implementation
Key Question: How can we make efficient lock? What hardware supports does it need? OS?
Lock Evaluation
- Does it support mutex well
- Basic function that prevents multipe threads enter the critical section.
- Performance: Especailly, lock overheads.
- No race: Overheads when one thread lock/unlock?
- Race: multipe threads compete to achieve the lock within one cpu.
Interrupt Control
Mutex Can be implemented by disabling interrupt within critical section.
- Pro: Very easy. Codes will be executed atomically.
- Cons:
- Privileged Thread - enable/disable interrupt.
- It must be trustable that thread will soley use the privilege for its inteneded purpose.
- i.e. Greedy program gets the lock and use all the process resource, infinite loop…
- Can’t in multi-processors.
- When multiple threads are running on multipe CPU, each thread may try to access the same critical section.
- In this case, interrupt disable on certain process has no influence on the other process; hence cannot guarantee the mutex.
- May miss important interrupt.
- What if CPU misses that reading request completed from the data storage? When can we awake the process waiting for the data reading?
- Inefficient - Modern CPU tends to take more time to enable/disable the interrupt compares to regular instructions.
- Privileged Thread - enable/disable interrupt.
- Usage:
- Therefore, handling interrupt must be dealt within the limited area.
- i.e. OS disbales interrupt to atomic execution within internal data structures.
- Since internal OS has no such thing as trust/privilege, it is okay for letting OS does what it wants.
Spin Lock
Test-And-Set (Atomic Exchange)
As we discussed, there is no point of disabling interrupt in multi-processors. Hence, multi-processors have hardware support.
The simplest hardware support is Test-And-Set or atomic exchange.
Without hardware support
1 | typedef struct __lock_t { |
Above code has two problems:
- Accuracy: If interrupt is called in the certain timing, the flag will be 1 for both of threads -> Fail to have mutex.
- Performance: spin-wait is infinitely checking the flag value. It wastes the time until the lock is released by the other thread -> Very wasteful especially within one processor.
With hardware support
1 | // TestAndSet is atomic |
It is a spin-lock as it keeps wasting CPU cycles until getting the lock -> Obviously it only works in preemptive scheduler which can trig the interrupt using timer.
Spin Lock Evaluation
- Mutex - Works
- Fairness - Can it guarantee the opportunity for waiting threads?
- No, Indeed, a thread spinning may spin forever,under contention. Simple spin locks (as discussed thus far) are not fair and may lead to starvation.
- Performance
- Case Single CPU: Overheads of spin-lock may be significant.
- Case Multiple CPU: It works reasoably well.
Compare-And-Swap
1 | int CompareAndSwap (int *ptr, int expected, int new){ |
CompareAndSwap is stronger than TestAdnSet; wait-free synchronization.
In spin-lock implementation, it works exactly the same as TestAndSet.
Load-Linked & Store-Conditional
1 | int LoadLinked(int *ptr) return *ptr; |
Load-Linked works same as load instruction.
1 | void lock(lock_t *lock){ |
As can be expected, StoreConditional may fail due to interrupt -> But the if condition (not updated since) guarantees that one thread has the lock at a time.
Fetch-And-Add
1 | int FetchAndAdd(int *ptr){ |
Ticket-Turn lock method. Since FetchAndAdd is atomic, assign numbers in ticket and turn to decide which thread’s turn to have the lock.
Once the thread is done with the lock, it increases the turn numnber, which allows one thread (the ticker holder) will exit the while loop and gets the lock.
- The important difference between the previous lock methods is that it executes in order, hence there is no starvation.
How to not waste CPU time in a spin?
First Approach: Yield (양보)
Approach is simple: Yield its allocated CPU to the other thread while waiting for the lock (spin).
- It works fine when there are few threads.
- If many threads exist
- let’s say the scheduling policy is round-robin, it needs to wait for n-1 strip time to do run-and-yield every time.
- It is better than wasting n-1 threads, but waste contect switch overheads (problematic when context switch is expensive).
- Starvation is possible - What if miss the time while waiting for the other threads to use the CPU?
Secon Approach: Queue - sleep instead of spinning
The fundamental problems of discussed methods are all on luck; either spinning or yield -> wasteful, and cannot prevent starvation.
Hence, it is required to explicitly control which threads get the next lock -> support from OS and use of Queue.
Implement two stage: guard and lock. Cases:
- Guard, no lock: Add itself to the queue (release guard) - Before critical
- Both lock and guard: critical section (release guard)
- Lock, guard: Wake up the next queue thread, release guard.
Guard is still spinning implementation, but alot more efficient as it only deals with few instructions; no critical section parts at all.
Linux Lock - Futex
Futex:
- Linked with particular physical memory address.
- Has its own internal kernel queue per each futex.
- Caller calls futex, and sleep/awake upon needs.
- futex_wait(address, expected), futex_wake(address)
Two Phase lock
- First Phase: wait in spinning with hope it will soon have the lock.
- Second Phase: Sleeps and awake after the lock is released.