Added thread library documentation to thrift whitepaper

Reviewed By: To be reviewed by slee and aditya

Test Plan: N.A.


git-svn-id: https://svn.apache.org/repos/asf/incubator/thrift/trunk@665075 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Marc Slemko 2007-04-01 09:14:05 +00:00
parent 3f234dad0e
commit 10b3bdbb85

View File

@ -16,6 +16,7 @@
\usepackage{amssymb} \usepackage{amssymb}
\usepackage{amsfonts} \usepackage{amsfonts}
\usepackage{amsmath} \usepackage{amsmath}
\usepackage{url}
\begin{document} \begin{document}
@ -769,9 +770,159 @@ reap the benefit of being able to easily debug corrupt or misunderstood data by
looking for string contents. looking for string contents.
\subsection{Servers and Multithreading} \subsection{Servers and Multithreading}
MARC TO WRITE THIS SECTION ON THE C++ concurrency PACKAGE AND Thrift services require basic multithreading services to handle simultaneous
BASIC TThreadPoolServer PERFORMANCE ETC. (ie. 140K req/second, that kind of requests from multiple clients. For the python and java implementations of
thing) thrift server logic, the multi-thread support provided by those runtimes was more
than adequate. For the C++ implementation no standard multithread runtime
library support exists. Specifically a robust, lightweight, and portable
thread manager and timer class implementation do not exist. We investigated
existing implementations, namely {\tt boost::thread},
{\tt boost::threadpool}, {\tt ACE\_Thread\_Manager} and {\tt ACE\_Timer}.
While {\tt boost::threads \cite{boost.threads} } provides clean, lightweight and
robust implementations of multi-thread primitives (mutexes, conditions, threads)
it does not provide a thread manager or timer implementation.
{\tt boost::threadpool \cite{boost.threadpool} } also looked promising but was not
far enough along for our purposes. We wanted to limit the dependency on
thirdparty libraries as much as possible. Because {\tt boost::threadpool} is not
a pure template library and requires runtime libraries and because it is not yet
part of the official boost distribution we felt it was not ready for use in thrift.
As {\tt boost::threadpool} evolves and especially if it is added to the boost
distribution we may reconsider our decision not to use it.
ACE has both a thread manager and timer class in addition to multi-thread
primitives. The biggest problem with ACE is that it is ACE. Unlike boost, ACE
API quality is poor. Everything in ACE has large numbers of dependencies on
everything else in ACE - thus forcing developers to throw out standard classes,
like STL collection is favor of ACE's homebrewed implementations. In addition,
unlike boost, ACE implementations demonstrate little understanding of the power
and pitfalls of C++ programming and take no advantage of modern templating
techniques to ensure compile time safety and reasonable compiler error messages.
For all these reasons, ACE was rejected.
\subsection{Thread Primitives}
The thrift thread libraries have three components
\begin{itemize}
\item \texttt{primitives}
\item \texttt{thread pool manager}
\item \texttt{timer manager}
\end{itemize}
As mentioned above, we were hesitant to introduce any additional dependencies on
thrift. We decided to use {\tt boost::shared\_ptr} because it is so useful for
multithreaded application, because it requires no link-time or runtime libraries
(ie it is a pure template library) and because it is become part of the C++0X
standard.
We implement standard {\tt Mutex} and {\tt Condition} classes, and a
{\tt Monitor} class. The latter is simply a combination of a mutex and
condition variable and is analogous to the monitor implementation provided for
all objects in java. This is also sometimes referred to as a barrier. We
provide a {\tt Synchronized} guard class to allow java-like synchronized blocks.
This is just a bit of syntactic sugar, but, like its java counterpart, clearly
delimits critical sections of code. Unlike it's java counterpart, we still have
the ability to programmatically lock, unlock, block, and signal monitors.
\begin{verbatim}
void run() {
{Synchronized s(manager->monitor);
if (manager->state == TimerManager::STARTING) {
manager->state = TimerManager::STARTED;
manager->monitor.notifyAll();
}
}
}
\end{verbatim}
We again borrowed from java the distinction between a thread and a runnable
class. A {\tt facebook::thread:Thread} is the actual schedulable object. The
{\tt facebook::thread::Runnable} is the logic to execute within the thread.
The {\tt Thread} implementation deals with all the platform-specific thread
creation and destruction issues, while the {tt Runnable} implementation deals
with the application-specific per-thread logic. . The benefit of this approach
is that developers can easily subclass the Runnable class without pulling in
platform-specific super-clases.
\subsection{Thread, Runnable, and shared\_ptr}
We use {\tt boost::shared\_ptr} throughout the {\tt ThreadManager} and
{\tt TimerManager} implementations to guarantee cleanup of dead objects that can
be accessed by multiple threads. For {\tt Thread} class implementations,
{\tt boost::shared\_ptr} usage requires particular attention to make sure
{\tt Thread} objects are neither leaked nor dereferenced prematurely while
creating and shutting down threads.
Thread creation requires calling into a C library. (In our case the POSIX
thread library, libhthread, but the same would be true for WIN32 threads).
Typically, the OS makes few if any guarantees about when a C thread's
entry-point function, {\tt ThreadMain} will be called. Therefore, it is
possible that our thread create call,
{\tt facebook::thread::ThreadFactory::newThread()} could return to the caller
well before that time. To ensure that the returned {\tt Thread} object is not
prematurely cleaned up if the caller gives up its reference prior to the
{\tt ThreadMain} call, the {\tt Thread} object makes a weak referenence to
itself in its {\tt start} method.
With the weak reference in hand the {\tt ThreadMain} function can attempt to get
a strong reference before entering the {\tt Runnable::run} method of the
{\tt Runnable} object bound to the {\tt Thread}. If no strong refereneces to the
thread obtained between exiting {\tt Thread::start} and entering the C helper
function, {\tt ThreadMain}, the weak reference returns null and the function
exits immediately.
The need for the {\tt Thread} to make a weak reference to itself has a
significant impact on the API. Since references are managed through the
{\tt boost::shared\_ptr} templates, the {\tt Thread} object must have a reference
to itself wrapped by the same {\tt boost::shared\_ptr} envelope that is returned
to the caller. This necessitated use of the factory pattern.
{\tt ThreadFactory} creates the raw {\tt Thread} object and
{tt boost::shared\_ptr} wrapper, and calls a private helper method of the class
implementing the {\tt Thread} interface (in this case, {\tt PosixThread::weakRef}
to allow it to make add weak reference to itself through the
{\tt boost::shared\_ptr} envelope.
{\tt Thread} and {\tt Runnable} objects reference each other. A {\tt Runnable}
object may need to know which thread it is executing in and a Thread, obviously,
needs to know what {\tt Runnable} object it is hosting. This interdependency is
further complicated because the lifecycle of each object is independent of the
other. An application may create a set of {\tt Runnable} object to be used overs
and over in different threads, or it may create and forget a {\tt Runnable} object
once a thread has been created and started for it.
The {\tt Thread} class takes a {\tt boost::shared\_ptr} reference to the hosted
{\tt Runnable} object in its contructor, while the {\tt Runnable} class has an
explicit {\tt thread} method to allow explicit binding of the hosted thread.
{\tt ThreadFactory::newThread} binds the two objects to each other.
\subsection{ThreadManager}
{\tt facebook::thread::ThreadManager} creates a pool of worker threads and
allows applications to schedule tasks for execution as free worker threads
become available. The {\tt ThreadManager} does not implement dynamic
thread pool resizing, but provides primitives so that applications can add
and remove threads based on load. This approach was chosen because
implementing load metrics and thread pool size is very application
specific. For example some applications may want to adjust pool size based
on running-average of work arrival rates that are measured via polled
samples. Others may simply wish to react immediately to work-queue
depth high and low water marks. Rather than trying to create a complex
API that is abstract enough to capture these different approaches, we
simply leave it up to the particular application and provide the
primitives to enact the desired policy and sample current status.
\subsection{TimerManager}
{\tt facebook::thread::TimerManager} applows applications to schedule
{\tt Runnable} object execution at some point in the future. Its specific task
is to allows applications to sample {\tt ThreadManager} load at regular
intervals and make changes to the thread pool size based on application policy.
Of course, it can be used to generate any number of timer or alarm events.
The default implementation of {\tt TimerManager} uses a single thread to
execute expired {\tt Runnable} objects. Thus, if a timer operation needs to
do a large amount of work and especially if it needs to do blocking I/O,
that should be done in a separate thread.
\subsection{Nonblocking Operation} \subsection{Nonblocking Operation}
Though the Thrift transport interfaces map more directly to a blocking I/O Though the Thrift transport interfaces map more directly to a blocking I/O
@ -879,11 +1030,18 @@ Thrift is a successor to Pillar, a similar system developed
by Adam D'Angelo, first while at Caltech and continued later at Facebook. by Adam D'Angelo, first while at Caltech and continued later at Facebook.
Thrift simply would not have happened without Adam's insights. Thrift simply would not have happened without Adam's insights.
%\begin{thebibliography}{} \begin{thebibliography}{}
%\bibitem{smith02} \bibitem{boost.threads}
%Smith, P. Q. reference text Kempf, William,
``Boost.Threads'',
\url{http://www.boost.org/doc/html/threads.html}
%\end{thebibliography} \bibitem{boost.threadpool}
Henkel, Philipp,
``threadpool'',
\url{http://threadpool.sourceforge.net}
\end{thebibliography}
\end{document} \end{document}