Sunday, April 13, 2008

A Guide to Threading on Leopard

Authors Note: This is a campanion article to the talk I gave @ CocoaHeads on Thursday April 10, 2008. You can download the a copy of the talks from http://www.1729.us/cocoasamurai/Leopard%20Threads.pdf.

Intro to Threading

It's becoming abundantly clear that one big way you can increase application performance is multithreading, this is because increasing processor speeds are no longer a viable route for increasing application performance, although it does help. Multithreading is splitting up your application into multiple threads that execute concurrently with some threads having access to the same data structures that your main thread has access to.
thread memory layout.png
A benefit to this technique is that its possible for your application to have a whole core to itself or that your app's main thread can reside on one core and a thread you spawn off can be spawned onto your 2nd core or 3rd,etc.

Why is this becoming an Issue?

This is becoming more and more of an issue these days because all the Mac's Apple sells are at least Dual Core machines with Intel Core 2 Duo's or 2x Quad Core Intel Xeon on the Mac Pro's. In other words Apple doesn't sell any single core Mac's (leaving out the iPhone in this instance) anymore, so now we have all this computational wealth to take advantage of, it's foolish not to take advantage of multithreading if it can benefit your application.

Why & When you Should & Shouldn't Thread

Now with all this talk I suppose it may be uplifting you to get on this bandwagon and put a bunch of threads in your app right now. But I should give you some info on why you should and shouldn't thread. Multithreading is not a tool that should be used because you can. One reason it's coming into use more and more is that when your app starts out it is given 1 thread and a Run Loop that receives events. If you are performing an incredibly long calculation on that thread it freezes your user interface leaving your users incredibly frustrated that they cannot do anything while this calculation is going on. Because of this multithreading became a topic of interest beyond just the operating system designers and became an issue for us as App designers. The solution was to spawn these long operations onto their own threads that run at the same time in the background while your user interface is still responsive to the users interactions. So what are some good candidates for threading operations?
  • File / IO / Networking Operations
  • API's that explicitly state they will lock the calling thread and create their own thread
  • Any compartmental/modular task that's at least 10ms
10 milliseconds is a bit ambiguous though. Does 10ms mean 10ms on a Core 2 Duo MacBook Pro with 4GB RAM or 10ms on a Intel Xeon with 16GB RAM or what? This 10ms time suggestion that's repeated many times over in Apples Documentation, the only suggestion I can give you is that if you gage that your app is taking at least 10ms on your machine at least it's a good candidate for threading assuming your machine is a Mac that your target audience might be using. In the end you need to do testing to see if your methods are taking at least 10ms. If you need to you can use Instruments or Shark to gage your applications performance. I made a quick screencast below showing how to do so with Instruments If you want to download this all you need is a free Viddler Account and download the original video from http://www.viddler.com/explore/Machx/videos/2/ and go to the download tab and you can see the original upload file and download it. This is my very first screencast so if you have any comments/suggestions feel free to let me know. When shouldn't you thread then?
  • If many threads are trying to access/modify shared data structures
  • If the task doesn't take at least 10ms
  • If the end result is that you app is taking unnecessary memory due to threading
  • If threads slow down your app more than it speeds it up
If your threads are trying to access a shared data structure isn't a no no when it comes to threads, because idealistically you should never spawn threads that do this, however in the real world this isn't always possible. When you spawn threads that are all vying for access to a shared data structure without locks you get inconsistent data across threads, which can leave your app acting oddly or even crash if your app depends on certain states in the data structure. However if you are a good programmer and provide locks in your application, it can lead to lock contention where you are putting locks in all the proper places, but because there are many threads trying to access the same thing it leads to a backlog of threads waiting to get access to the same data structure. Apple repeatedly quotes the 10ms rule (see discussion above) and is their suggestion that if your under 10ms it's not worth spawning a thread for the particular task. It's also important to remember that every time you spawn a thread, memory is allocated to in terms of Kernel Data Structures and Stack Space for allocating variables on the thread. If you are spawning a lot of threads your threads are consuming memory on your system and could end up slowing it down more than it really benefits it too.

Thread Costs

Every time you create thread it's important to note that there is an inherent cost to that creation in terms of CPU Time and Memory allocation. According to Apples Documentation Thread creation has the following costs as measured on a iMac Core 2 Duo with 1GB RAM:
Kernel Data Structures1 KB
Stack Space512 KB 2nd*/8MB Main
Creation Time90 microseconds
Mutex Acquisition Time0.2 microseconds
Atomic Compare & Swap0.05 microseconds
* = 16 KB Stack Minimum and the stack space has to be a multiple of 4 KB Thus if your spawning many threads in your application you are consuming a great deal of resources and not only that if you spawn way too many threads your threads could possibly be fighting for CPU resources.

Warning!

It's also important to note that although threads are a tool just like anything else in the Cocoa/Foundation frameworks, if you misuse threads and screw up with them you could really screw up bad with them. My advice if anything is that you should never use threads because you simply can use them, but rather you use them because you've gaged the performance of your application and looked at several methods and decided that they could benefit from the concurrency that it offers. That means going into Shark and/or instruments and developing some performance metric for your application and watching how it does threaded/non-threaded. My last piece of advice is that you pay attention to what is and isn't thread safe in Apples Documentation. In the Multithreaded Programming Guide Apple does list off some of the classes that are and aren't thread safe. A general rule for thread safety is that mutable objects are generally not thread safe while immutable objects are generally thread safe. If Apples Documentation on the class you are using doesn't make an explicit reference to the class being thread safe, you should assume that it is not thread safe.

How Threads are Implemented on Mac OS X

thread_imp.png
Mach Threads
At the lowest level on Mac OS X Threads are actually implemented as Mach Threads, however you probably won't ever be dealing with them directly. According to the Tech Note 2028 User Space programs should not create mach threads directly. If you want to know more about Mach Threads I would suggest buying Amit Singh's Mac OS X Internals Book and reading up on them there, where he describes them in great detail. POSIX Threads
POSIX Threads are the next type of threads and these threads are layered on top of Mach Threads. POSIX Stands for Portable Operating System Interface and is a common set of API's for performing many tasks, amongst them creating and managing threads. You can use threads in your application simply by including the pthread.h header file in your class and you have access to the POSIX API's. This is the level you will probably go down to if you need finer grain control over your threads than the higher level API's offer. High Level API's ( NSObject / NSThread / NSOperation )
Cocoa offers an array of Threading API's in Leopard that make it convenient and easy to thread. NSObject
Starting with Mac OS X 10.5 Leopard NSObject gained a new API called - (void)performSelectorInBackground:(SEL)aSelector withObject:(id)arg. This method makes it easy to dispatch a new thread with the selector and arguments provided. It's interesting to note that this is essentially the same as NSThreads Class method + (void)detachNewThreadSelector:(SEL)aSelector toTarget:(id)aTarget withObject:(id)anArgument with the benefit being that with the NSObject method you no longer have to specify a target, instead you are calling the method on the intended target. When you call the method performSelectorInBackground: withObject: you are in essence spawning off a new thread with the class of the object that immediately goes to execute the selector specified. So take this example:
[CWObject performSelectorInBackground:@selector(threadMethod:) withObject:nil];
is essentially the same as spawning a new thread with the class CWObject that immediately starts executing the method threadMethod:. This is a convenience method and puts your application into multithreaded mode. NSThread
If you've ever dealt with threading pre-Leopard chances are that you probably came in contact with NSThreads Class Method detachNewThreadSelector: toTarget: withObject:. However it's gotten a couple new tricks. For starters you can now instantiate NSThread objects and start them when you want to and you can create your own NSThread subclasses and override it's - (void)main method, which makes it look similar to NSOperation, minus many of the benefits NSOperation brings to threading operations including KVO compliance. NSOperation / NSOperationQueue
NSOperation is the sexy new kid on the block and brings a lot to the table when it comes to threading. It's an abstract class that you subclass and override it's - (void)main method just like NSThread, and that is all that is required from thereon out to instantiate your subclass and put it into action. Additionally NSOperation is fully KVO compliant. NSOperationQueue brings easy thread management to the table. It can poll your application performance, system performance/load and automatically spawn as many threads as it thinks your system can handle, making it future proof with regard to new hardware coming down the line from Apple, though I should note that in the documentation it says that it will most likely spawn off as many threads as your system has cores.

Thread Locks

I mentioned Thread Safety earlier on and a important part of that is that you make sure at critical points in your code that only 1 thread has access to a portion of code at a time to make sure that the value the thread retrieves is still correct or that 2 threads aren't changing the same value at the same time. There are several methods you can use to put locks on in your application. @synchronized Objective-C Directive Objective-C has a built in mechanism for thread locking. The @synchronized directive will take any Objective-C object including self. Whatever you pass to the @synchronized should be a unique identifier and will block other threads from accessing the block of code in between the brackets after the @synchronized code. In addition to providing a thread blocking mechanism, it also provides for Objective-C exception handling which means exception handling has to be enabled in your application in order to use it. When exceptions are thrown it releases the lock and throws the exception to the next exception handler. Some examples of how to use @synchronized are below:
- (void)veryCriticalThreadMethod
{
    @synchronized(self)
    {
        /* very critical code here */
    }
}
Apples doc's also make reference to a way you can just identify and lock your method down
- (void)veryCriticalThreadMethod
{
    @synchronized(NSStringFromSelector(_cmd))
    {
        /* very critical code here */
    }
}
In this case we are passing a string to the @synchronized directive which uses NSStringFromSelector method and passes in _cmd , which is just a reference to the methods own selector. Additionally if you are trying to ensure that only 1 thread can change the value of an object at a time you should just simply pass in that object there. NSLocks The @synchronized might be the most convenient method to implement thread locking, however it is not the only method by far, and in some situations is not the right one to apply. Here are some of the other methods for thread locking NSLock NSLock is the most basic form of locking, it uses POSIX Thread Locking (link above) to implement it's locking.
NSLock *myLock = [[NSLock alloc] init];
[myLock lock];
/* critical code here */
[myLock unlock];
To use it in it's basic form all you need to do is to simply create an instance of a NSLock and call lock on it, then put your code after the lock call and when you are ready call unlock on it. I should note that if you have a situation where you try to call lock on it again you will get a deadlock because the lock is already locked and can't lock again. This brings me to NSRecursiveLock NSRecursiveLock NSRecursiveLock is just like NSLock in that it will provide locking for your thread, but it will allow you to call lock on it again if you are using the lock in a recursive manner like its name suggests.
NSRecursiveLock *myLock = [[NSRecursiveLock alloc] init];
-(void)veryCriticalThreadMethod { [myLock lock];
[self incrementCounterBy:2];
[myLock unlock]; }

-(void)incrementCounterBy:(NSInteger)aNum { [myLock lock]; self.counter += aNum; [myLock unlock] }
NSConditionLock NSConditionLock does as it's name suggests and gives you a set of API's for developing your own system as to when it should and shouldn't lock. It's conditions are all NSIntegers you pass in along with possible date references to lock down your thread. POSIX Thread Locking It is possible to use POSIX Thread Lock methods in your code since your thread is residing on top of a POSIX thread anyway. To use a POSIX Lock in your code you need to include the pthread.h header and use it like such
#include <pthread.h>

static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

if(pthread_mutex_lock(&mtx))
    exit(-1); /* your lock has failed */

/* super vital secret code here */

if(pthread_mutex_unlock(&mtx))
    exit(-1); /* your unlock has failed */
It should be noted that the Google Mac team has blogged about this. They particularly found the @synchronized directives setup to be bloated for what they needed, because it not only locked the thread, but the exception handling setup was causing their app to not perform as well as they wanted. Their solution in the end was to use the pthread lock directly. Proper Locking with KVO Notifications As noted by Chris Kane on the Cocoa-Dev mailing list, KVO is thread safe but the manner you use the willChangeValueForKey and didChangeValueForKey can screw things up if you use it wrong. Thus if you use KVO notifications you should apply your lock send the willChangeValueForKey notification, modify your object, and then send the didChangeValueForKey notification like this:
NSLock *myLock;

[myLock lock];

[self willChangeValueForKey:@”someKey”];

someKey = somethingElse;

[self didChangeValueForKey:@”someKey”];

[myLock unlock];
He also discusses a receptionist pattern that can be used to perform KVO updates on the main thread from a 2nd+ thread. Locking with Threaded Drawing Another topic of interest for people is how you get threads to be able to draw to an NSView. Although I won't dive too deep into here the basic operations you need to follow when doing threaded drawing involve locking the NSView, performing your drawing and then unlocking the view like so
/* important that we acquire a lock for the view first */
if([ourView lockFocusIfCanDraw]) {
    
    /* do drawing code here */
        
    /* flushes the contents of the offscreen buffer to the screen if flushing & buffered is enabled */
    [[drawView window] flushWindow]; 
    /* relinquish our lock on the view */
    [ourView unlockFocus];
}
NSObject In Leopard NSObject gained a method called performSelectorInBackground: withObject, which I mentioned earlier is essentially the same as NSThreads class method for dispatching new threads minus the argument to which target object you want doing the operation.
/* in AppController.m */

[self performSelectorInBackground:@selector(doThreadingMethod:)
                       withObject:nil];
                    
/* is roughly equivalent to */

[NSThread detachNewThreadSelector:@selector(doThreadingMethod:)
                         toTarget:self
                        withObject:nil];
NSThread If you've ever been doing any threading, or looked at any Cocoa's project that did threading prior to Leopard you probably ran into NSThread and it's class method + (void)detachNewThreadSelector:(SEL)aSelector toTarget:(id)aTarget withObject:(id)anArgument which is shown in the source code example just before this. NSThread is a easy way to get a thread going. However it's got a new ability in leopard to create instances of NSThread and later on when you want to call start on it and then it will create your thread.
NSThread *myThread;

myThread = [[NSThread alloc] initWithTarget:self
                 selector:@selector(doSomething:)
                   object:nil];

/* some code */

[myThread start]; //start thread
All of this can be more convenient to you depending on how you like to use NSThread. However an ever better new ability in my opinion is your ability to subclass NSThread and override it's -(void)main method like so:
@interface MyThread : NSThread {
    NSString *aString;
}
@property(copy) NSString *aString;
@end

@implementation MyThread

@synthesize aString;

- (void)main
{
    self.aString = @"Super Big String that needed a thread";
}

@end
Just like as is the case with a generic NSThread object you simply create an instance of your NSThread subclass and call start on it when you need it to dispatch a new thread. It's also very important to Note that whatever method you call with the NSThread class method setup a NSAutoReleasePool and free it at the end of the method for non garbage collected applications, if your application is garbage collected you don't have to worry about this as this is automatically done for you. NSOperation I've made it no secret that I have a particular fondness of NSOperation and have been evangelizing it for a while now and for good reason. NSOperation brings the sexy to threading and makes it easy to not only create them, but with KVO and NSOperationQueue take a great deal of control of your threading operations. In particular I like this trend Apple is evangelizing of breaking your code up into specialized chunks that in combination with your custom initializer methods can be an effective force inside your app. In particular now I use a subclass in my app RedFlag (a Del.icio.us client in development) called RFNetworkThread all I have to assume is that a credential has been stored in the users keychain beforehand and I dispatch these threads with different URL's with call outs to del.icio.us and they can each store and retrieve the data for a different particular query easily. NSOperation is no different in how it's implemented in relation to subclassed NSThread objects with the exception of specifying that you are subclassing from NSOperation instead:
@interface MyThread : NSOperation {
    NSString *aString;
}
@property(copy) NSString *aString;
@end

@implementation MyThread

@synthesize aString;

- (void)main
{
    self.aString = @"Super Big String that needed a thread";
}

@end
However with NSOperation you gain some big benefits, most notably among them that you can use NSOperation with NSOperationQueue to have some sort of flow of control, NSOperation is also KVO compliant so that you can receive notifications about when it is executing and done executing. It's bindable properties are:
  • isCancelled
  • isConcurrent
  • isExecuting
  • isFinished
  • isReady
  • the Operations dependencies array
  • quePriority (only writable property)
Apple recommends that if you override any of these priorities that you maintain KVO compliance in your Operation objects, additionally if you extend any further properties on your operation objects you should make them KVO compliant as well. If you want to observe when your operation objects have completed execution, you could do it in a manner like this:
    1 NSOperationQueue *que = [[NSOperationQueue alloc] init];
    2     
    3 loginThread = [[RFNetworkThread alloc] init];
    4     
    5 [loginThread addObserver:self
    6               forKeyPath:@"isFinished" 
    7                options:0
    8                   context:nil];
    9     
   10 [que addOperation:loginThread];
Heres what we did (#1 Line 1) Created a NSOperationQueue to spawn our network thread on. (2: Line 3) Created our NSOperation Subclass object (3: Line 5) add the controller class as an observer of the operation objects keyPath "isFinished." Because isFinished will always be NO till it actually completes and changes to YES we just only need to be notified to when that keyPath is changed and then (4: Line 10) Add the operation object to the Queue. It's important to note that if you want to modify anything about a particular NSOperation object you need to do it before placing it onto the queue, otherwise it's just like playing russian roulette... you just don't know what will happen. Now to observe the change in the operation object you only need to implement standard KVO notification in whatever class added itself as an observer to the operation object like so:
    1 - (void)observeValueForKeyPath:(NSString *)keyPath 
    2                            ofObject:(id)object 
    3                               change:(NSDictionary *)change 
    4                             context:(void *)context
    5 {
    6 if([keyPath isEqual:@"isFinished"] && loginThread == object){
    7         NSLog(@"Our Thread Finished!");
    8         
    9         [loginThread removeObserver:self
   10                          forKeyPath:@"isFinished"];
   11                         
   12         [self processDeliciousData:loginThread.returnData];
   13     } else {
   14         [super observeValueForKeyPath:keyPath
   15                              ofObject:object 
   16                                change:change
   17                               context:context];
   18     }
   19 }
   20 
The method observeValueForKeyPath: ofObject: change: context: allows us to be notified through KVO when our object has completed. In this instance I am only concerned about making sure that the path is "isFinished" and that the loginThread I created earlier is the object in question (line 6), because of that once I've gotten this notification I only need to remove myself as an observer and do what I intended to do with the data I got back from the operation. Beyond that on line 14 I only need make sure that if the operation hasn't finished and it's not our object that I just pass on the KVO notification up the chain of interested objects. The other alternative is to do like Marcus Zarra has demoed on Cocoa Is My Girlfriend and create a shared instance, so as to provide a reference that you can call performSelectorOnMainThread on and return control back to the main thread. Neither option is the wrong one, it just depends on which method you like better and how many operation objects you are tracking. What's better in my opinion is when you create your own dependency tree and use KVO notifications to be notified when the root dependency has been completed. This brings me to my next point NSOperation dependencies. NSThread has no built in mechanism for adding dependencies as of right now. However NSOperation has the - (void)addDependency:(NSOperation *)operation method which allows for an easy mechanism (when used with NSOperationQueue) for dependency management. So with that lets get into NSOperationQueue... NSOperationQueue Unless you intend to manage NSOperation objects yourself, NSOperationQueue will be your path to managing threads and has several API's for dealing with how much concurrency you can tolerate in your app, blocking a thread until all operation objects have finished, etc. When you place NSOperation objects onto the queue it will go through your objects and check for ones that have no dependencies or have dependencies that have already completed. In this way you can build your own custom dependency tree and toss all the objects onto the queue and NSOperationQueue will follow and obey how much concurrency you can handle. As noted earlier you only need to use - (void)addDependency:(NSOperation *)operation to indicate this to the operation queue. By default NSOperationQueue will spawn off as many threads as your system has cores, however if you need more or less or would just like to be able to specify that in your application programatically you can use the - (void)setMaxConcurrentOperationCount:(NSInteger)count and set concurrency count yourself, although Apple reccommends that you use the value NSOperationQueueDefaultMaxConcurrentOperationCount which automatically adjusts the number of operation objects that execute dynamically as your system load levels go up and down. Another useful API is the ability to put a bunch of Operation objects onto the que and wait for them to finish halting the method you are in while you wait for this to happen. The - (void)waitUntilAllOperationsAreFinished method was designed for this. I would recommend that you perform this on a 2nd+ thread so as to not interrupt the main thread while you wait for operation objects to complete. Since NSOperation is KVO compliant, it only make sense that NSOperationQueue be KVO compliant as well. And the most useful binding NSOperationQueue has is the ability to bind to all the NSOperation objects being executed with its operations bind-able property. This is somewhat analogous to Mails Activity viewer ( Command + 0 ) interface. This enables you to provide information to your users about what is going on in your application if you are providing such an interface. Personally I would recommend that you do provide such an interface, but expose it as an optional interface in the same way that mail doesn't show the activity viewer by default but allows you to hit a shortcut and bring up the activity viewer.

Conclusion & References

I've touched over a lot things in this article, but even at this I am barely scratching the surface of multithreading. It's a very expansive topic indeed. But I hope I gave you a starting point here. In this article i've covered:
  • What threads are
  • When you should use threading
  • When you shouldn't use threading
  • What the Costs of Creating Threads are
  • A warning about Threading and it's implications
  • How Threads are implemented on Mac OS X
  • Mach Threads
  • POSIX Threads
  • Thread Locks
  • NSThread
  • NSOperation
  • NSOperationQueue
And yet so much still left to talk about. Below you'll find a simple concise list of links of API's and Classes mentioned throughout the article. NSRunLoop Technical Note TN2028: Threading Architectures (About Mach Threads) Thread Costs Thread Safety Thread Safe and Unsafe Classes Apples Threading Programming Guide Mac OS X Internals (Book) POSIX Threads (Wikipedia) POSIX Thread Header File ( pthread.h ) NSObject Threading Method NSThread Class Reference NSStringFromSelector Method NSLock NSRecursiveLock NSConditionLock POSIX Thread Lock Google Mac Blog on @synchronized performance slowdown Chris Kane @ Apple on KVO Thread Safety and KVO updates NSOperation NSOperationQueue Marcus Zarra Tutorial on NSOperation with shared instance reference

Credits

Thread memory layout graph based off of graph from CocoaDevCentral licensed under a Creative Commons License @ http://cocoadevcentral.com/articles/000061.php by H. Lally Singh

6 comments:

Anonymous said...

Is 90ms a misprint? Does it really take that long, or should it be microseconds?

Colin Wheeler said...

no that is a misprint, sorry. It is 90 microseconds to create a thread. It's been corrected.

tarasis said...

At the beginning campanion should be companion.

Otherwise thanks for a great and helpful article.

Anonymous said...

Worth mentioning as well is the OSSpinLock{Lock, Unlock} combination; these busy-wait, but are by far the cheapest to acquire/release. In the case of many resources and lightweight access (very few threads, and/or little work being done per access), they may pay off significantly.

Consider the set/get methods for 10000 objects. Creating an OSSpinLock for each object is cheap (it's just an int), and if the likelihood of access to an given object is small (ie, there's not 1 object constantly being accessed and 9999 very rarely), the OSSpinLock approach (implemented by memory barriers, IIRC) can really pay off, because the cross-section of thread conflict is minute.

Simon

Anonymous said...

Thanks Colin, that's a great introduction to the new threading features that are available in Leopard.

What might come in handy if you're to mix Cocoa threads with pthreads is this little section in Apple's documentation.

Happy threading.

Anonymous said...

Thanks for the article. It was very helpful. Can I ask two simple questions about implementing a class that inherits NSThread?
(1) Is it ok (or recommended) to create the NSAutoreleasePool at the start of the over-ridden main method, and release it at the end of that method?
(2) If you release an instance of a class that inherits NSThread after the thread has been started, it seems (from a few experiments) to wait until the thread has finished to actually release the object, is this really the case?
Thanks,
Charlie.

 
...