|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Class Summary | |
|---|---|
| Main | The main class of the RADTools project. |
The main package of the RADTools project, contains all the primary documentation.
Editors Note: We apologize for the state of the diagrams in non-IE browsers. We are still working through debugging SVG files in many of these browsers, and would appreciate any help our readers can offer.
However for a research project this situation is quite untenable. First off, managing even a relatively small web-service needed to generate useful research data can require an unreasonable amount of expensive graduate student, professor and staff time. Second, and far worse, modern projects involving power consumption, statistical machine learning and other ideas founded on automated configuration changes are incredibly difficult if not impossible.
The key problem is that even those few applications which provide a good management interface rarely support automation or dynamic configuration changes, and even fewer provide any kind of uniform configuration and control interface.
This problem was incredible evident during the labs for CS294-1 RADS, Fall2006 wherein we, as graduate students with a simple assignment and detailed instructions, still had a fair amount of difficulty in sorting out basic management tasks for the first time.
As an effort to make management of these systems slightly more tenable, and more important make research using them far easier we have developed this project, dubbed RADTools.
To this end, every service which can be managed b
RADTools must be represented by an object which
implements at minimum the RADService interface.
This interface includes
state management,
structure and a
uniform abstraction for expanding this
interface, both statically
and dynamically. In the
below sections we describe the state and structure abstractions, however the
javadocs at the above links are far better references for any code based on
this project.
State Managementproperty
provided by a RADService is
"RADService.State". This enables some of the most important
benefits of RADTools: namely failure management, including both
causality and failure
of the ability to manage
services.In the current implementation state management has been restricted to positive feedback only. That is to say, we never make assumptions about the state of a service, but instead rely on positive feedback to determine if a service is running, stoppped, failed, etc. This is vital, as erroneous assumptions about service state which trigger corrective actions may in fact exacerbate the situation. In the future, when adding assumptions about service state (e.g. timeouts and other such tricks) the programmer must be careful to ensure that their assumptions are either acted on in such a way that the situation cannot be exacerbated, or ensure that they first enforce the assumption.
For example if the liveness check for a service times out, the current
implementation will mark the service state
as unknown. If instead
the service is to be reported as
failed the check, upon
timeout, should first either crash or stop the service, thereby making the
assumption of failure true. This will ensure that the pre-conditions for any
corrective actions are properly met.
RCF tree ADT. These are
the composition,
dependency and
management trees, all of
which are shown in the below diagrams.
The fourth structure is the
communication graph
(shown below), which is meant to capture the path of data in a distributed
system, in particular requests or RPCs in a web-service. In most web-services
a tree model would be appropriate, as all components are RPC based, however in
more general distributed systems, this will often not be the case, and as such
we prepared for a graph abstraction.
However the RCF graph ADT is not yet complete and
until after the CS294-1 projects were over, we had little access to path based
analysis tools meaning we had no good means or reason to capture communications
paths. As such this structure is currently unimplemented, though at the time
of this writing we are already beginning to remedy this.
The key use of these structures, in particular the fully implemented trees, is
to quickly propagate service state changes and manage causality. In the
dependency and management trees, failures propagate down as the children of a
node are those which depend on it or are managed through it. In contrast in
the composition tree failures propagate up, as
larger services are built out of
smaller ones. This accurately models the fact that, for example, the failure
of a physical machine will result in the failure of a virtual machine and that
the failure of a database could result in the failure of the entire
web-service.
State propogation in the communication graph is slightly more complicated. If it can be reduced to an RPC communication tree, clearly failures propagate up, and this is another kind of dependency tree. However as a general graph the failures must be propagated in the direction of the data flow. Combined with some vertex local information this could generate full failure causality information, something clearly missing from existing distributed systems tools. A simple example of failure propogation is shown below.
The biggest benefit of these structures to the casual user of RADTools is their
display in the main window, and
the resulting ability to start all the component services of a web-service in
a single click, not mention the visualization of failures.
Events are widely used in RADTools to model
causality, and thereby implement policy. For example, most suggested policies
for power savings in a web-service datacenter are based on starting and
stopping servers based on the current service load. Shown below is a possible
diagram of the event sources and
sinks which could implement a policy like
this.
Of course more general examples can be manufactured, but RADTools aims to provide the event framework, rather than implement any specific policy.
Out choice of the java language was driven by the availibility of the JSCH and RCF libraries, in addition to the cross-platform compatibility and high level of abstraction provided by java. In contrast to a collection of shell scripts, this means that RADTools provides a far more useful (and robust) abstraction. In contrast to other mainstream programming languages, this gives us access to a richer set of libraries.
RADService interface for
LigHTTPD,
HAProxy,
Ruby on Rails,
MySQL and
Memcached. There are also RADServices for
VMware,
Linux and
Fedora Core. In fact the
MySQL and
Memcached services are
currently restricted to Fedora, primarily because that is what we had to test
with on the Millennium cluster at Berkeley.
The linux system RADService
includes support for querying
nagios, if it is running,
and reporting the useful nagios statistics as
properties
of the linux system: "Nagios.CurrentLoad",
"Nagios.NumUsers", "Nagios.NumProcs",
"Nagios.PercentDiskFree", "Nagios.PercentMemUsed".
This is a primitive form of service discovery but is
a good example of how it might be accomplished.
RADTools is the base service, and represents
the RADTools application in the various structures
(RADService.management() in particular). It includes
the code to generate the main window, including tree views of the
service structures. Furthermore, as the current
implementation of RADTools is focused entirely on the management of web-services,
the RADTools object includes references to the
datacenter, upon which all
physical machines are assumed to depend, and the
website or web service, the
ultimate composite service which RADTools is meant to manage.
Finally RADTools includes a queue, which is used to provide
scheduling
of long running tasks. In addition to allowing a more controlled model of
execution, this central queue of tasks is shown in the GUI providing positive
feedback to the user. A current deficiency of the skiplist used to keep tasks
in the order they should be run
(timer tasks may specify a date
after which they are to run) means that duplicates are not currently
eliminated, however this will be fixed very shortly.
We downloaded the JSCH library, and it's attached compression library from
JCraft. While the library itself contains
no major documentation, the examples were enough
to jumpstart our development, despite their quirks.
The largest, and really only, drawback to our use of this library is it's design: JSCH includes multi-threaded code without clear documentation of why or when thread safety may be an issue.
The transactional data structures are the
basis of nearly all of the RADTools code, and provide some vital functionality:
the ability of any implementation of
Collection to generate an
event in
response to a mutation. This is what allows us to write the below code, which
configures HAProxyLinux to add a new
proxy pool, and a
new server to
that pool.
HAProxyLinux.HAProxyPool pool = new HAProxyLinux.HAProxyPool(new HostPort.Default("0.0.0.0", 10000));
proxy.pools.add(pool, "apool");
pool.servers.add(new HAProxyLinux.HAProxyServer(new HostPort.Default("localhost", 25), 22, 3000, 1, 2), "aserver");
The above code is concise, easy to understand and similar to what would appear in the HAProxy configuration file, thereby making it easy to learn for those familiar with HAProxy, and easy to automate even for those who are not. However what really makes that three line code snippet interesting is that, because of the transactional data structures, it will actually cause a new HAProxy configuration file to be generated, uploaded over SSH to the server, and HAProxy to be gracefully restarted to use the new configuration.
The second main component of RCF used in RADTools is the
event model. As noted above, the
transactional data sructures rely on the events package, to provide a set of
standard interfaces for sourcing,
syndicating and
sinking
events. We omit further discussion as
it would merely duplicate the Events & Continuations section above.
The third and final main component of RCF used by RADTools is the
component framework. While RADTools
relies extensively on this, it was also the catalyst for its final development.
The component framework provides an abstraction of reflection with extensions
for the dynamic addition of
operations (methods)
and properties (fields)
on components
(objects). The ability to dynamically add properties (fields) to a component
is the basis of our integration with nagios, as seen in
LinuxSystem.
Furthermore the component framework includes support for generating
property change events
in response to property changes. This allows the GUI to be kept in sync with
the properties, and the configurations to be kept in sync with the GUI.
Please see the GUI package and
AbstractDynamicProperty.gui(rcf.core.framework.component.DynamicBound.GUIType)
for details about the automatic GUI generation, and property synchronization
code.
The second role is easiest to imagine in the case where an automated, (perhaps SML based) management system is written in java and linked against RADTools. In this section we strive to document some of our development difficulties in the hope that projects seeking to integrate with RADTools this way will be able to avoid the pain that we suffered.
This bug causes incremental compilation of the RCF libraries to fail after
edits of some files, particularly those involving the
rcf.core.util.map package or the rcf.core.util.collection.Skiplist
class. The result will be that random compiler errors will appear in
possibly only vaguely related files (including this one, if there is even a
link to the skiplist file), often with an error appearing on the first of the
line of the file (always a comment in this project). The solution is to
perform a clean build using the Project->Clean menu to fully
rebuild the project.
The bug has already been fixed in the next versions of Eclipse (we submitted the bug report a month or so ago) which will be released in the next month or two.
rcf.core.util package impossible to generate.The problem is in the ability of javadoc (and perhaps javac as well), to trace the class hierarchy of certain inner classes, causing it to emit spurious errors and warnings and finally to throw an exception and terminate. We have yet to fully isolate this bug, despite quite some time trying, and therefore have simply omitted that documentation from this website. We hope to find a workaround, or a solution soon.
This problem is unfortunate, as javadocs are quite possibly one of the best code documentation tools in widespread use, however our code in question is quite complicated, and uses complex features added in Java 1.5, so we fully expect that many of the bugs will disappear when Java 1.6 goes to full production release.
JSCH and Swing use
java threads, without the consent or intervention of
the client programmer. The fact that java threads are ubiquitous, cross
platform and standardized is wonderful as this makes writing multi-threaded
code easy. However both Swing and JSCH sometimes lack appropriate
documentation to describe the thread requirements of using them.Please note that for swing the threading reference is the Concurrency in Swing article. As a result of this we spent a fair amount of time debugging threading problems, only two of which could be traced to our own code, or lack of understanding about these libraries.
Given how powerful both JSCH and Swing are, we find that even with these problems, using them allowed us to produce a significantly better project in a much shorter time. However the clear lesson here is that any library which introduces threads to a program must document how it does so, why it does so and what restrictions the library imposes on the user to enforce thread safety. The one escape clause in this requirement, which we must invoke in places for this project, is that such documentation may only be missing if the threading is provided by a base library which is missing this documentation itself.
An unfortunate consequence of this is that anyone using RADTools as a codebase may currently encounter some concurrency bugs. We have not, and we will be more than happy to debug them should they arise, but this is a possible issue.
In general, RADTools follows a simple Swing threading model: long running
tasks should be scheduled through
RADTools.schedule(rcf.core.concurrent.schedule.TimerTask),
and GUI operations should be scheduled using SwingUtilities.
As a final note, RCF library provides no thread safety or synchronization, with
the exception of the GUI service which
will maintain thread safety between a worker thread and the Swing event
dispatcher. Any users may also wish to investigate the
AdapterHelpers.cast(Object, Class, rcf.core.concurrent.schedule.Runner, rcf.core.util.groups.ImmutableTriple[], rcf.core.util.adapter.TypeAdapter[])
method which can be used to add synchronization to nearly any object or
method.
AdvancedResearchIndexLoadLinux
and ARIL).
Of course a real benchmark takes slightly longer than 2min to run, but the
point is that the setup for this style of benchmark took two weeks for us
toward the beginning of the semester (during
Lab3
for example), without RADTools.This is by far a clear win: we can now easily do research that was difficult, and unreliable in the past.
In addition to making life easier, this means that more complex, and realistic web services can now be researched, and that projects can, more easily, experiment with a variety of system configurations without learning all of the text file formats, and dealing with logins on 13 different machines, as was literally the case during CS294-1 RADS, Fall 2006.
Aside the short term benefits described in this section, we believe that RADTools opens up opportunities previously closed because of their difficulty which we discuss below.
event model and
component framework were both well planned out and
partially complete, there was a fair amount of work to finish them off.At the end of this project it has turned out that the code based on these libraries is significantly easier to both write and understand. Furthermore, without them the event and continuation programming of website policy required for SML would be impossible.
However as with any library there is still work to be done, everything from
better concurrency support in the rcf.core.concurrent.primitives
package to simplify the problems outlined in section 4.3,
to a more complete AutoGUI in the
gui package, to a minor rewrite of
the rcf.core.util.collection.Skiplist datastructure to allow elimination
of duplicate tasks in the RADTools task queue.
However going forward with this project it's clear to us that managing a large number of machines from a single point will result in a fairly large load. Currently the management traffic is restricted to simple state updates and occasional configuration uploads, however in the future, access to logs and a larger set of continuous performance data suggests that management of a distributed system, must itself be managed and distributed.
Because of the way the RCF event model and component framework have been
designed, it would be a simple matter to extend them to include RMI (Remote
Method Invocation), as in JMX, upon which the
component model is loosely based. This should enable two major features:
first and foremost it would easily allow distribution of the management system
without breaking the abstraction in any way, and second it would allow
non-java code easy access to the management system, by tapping into the RMI
mechanism.
Main.inner(). Given the separate
class compilation model of java this is not an onerous requirement, and yet it
would clearly be nice to simplify the process of describing a new system, as
this is a painful task and must be completed before RADTools can be used to
manage a system.Obviously adding a simple system description language would go a long way to decreasing the perceived cost of describing a new system, even if it does not make any real difference, since the java is quite concise and self-documenting. However far more interesting would be integration with some automatic service discovery system.
There are currently two usage models in mind for RADTools, first, the management of a pre-existing system and second, setting up a new system. Given that RADTools includes the vast majority of the configuration options for the various RADServices it supports, the second model is clearly both preferable and tenable, as the initial setup of a distributed web service is often the most painful part.
However in both cases, there is information a user should not have to enter. Clearly some things, like the DNS name or IP address of at least one server involved, must be entered. However information like which component services each server has installed, or can run could be discovered by simple inspection of installed programs.
Furthermore path based analysis could be used both to discover relationships
between component services, which could then be reflected by the
communication structure .
At the time of this writing we already beginning to work with another
group from CS294-1 to do just this.
Our contribution with this project is an abstraction and codebase which we hope will remove from future classes and research, the drudgery we felt working with Ruby on Rails administration during the class labs. Given the responses of some of our fellow students, we feel we've already gone a long way towards this goal, but time and further projects will tell.
Contributing to this research in a very real way was a major influence on the
design of RADTools, primarily in the decision to use the RCF library in order
to simplify further coding. For example we use the RCF
event model to capture
RADService state changes, which are propagated by
service state proxies
through the various RADService structures (most notably
management). This event
model was specifically designed to be generalizeable to any kinds of events,
including periodic performance data gathering, from
“Nagios.CurrentLoad" to
radtools.services.researchindex_load. Specifically we have planned
that any SML or other “policy” manager should be designed as a series of
event sinks which implement DSP or
SML algorithms over time series data to produce service control calls, e.g.
to set a RADService.radServiceState(), as shown
below.
As a result we have already begun investigating how RADTools could be used to manage other distributed systems. In particular, because RADTools relies on the RCF library, which is a key part of the RDL Compiler v3 (RDLC3, see the RAMP website for more information), we believe that it will be both easy and very fruitful to adapt RADTools to manage an running RDL host or target system. In particular, RDL provides support for cross-platform system design and emulation, which implies that there are a number of heterogeneous platforms which must all be running components of the same system at once, and working in concert, exactly the scenario RADTools is meant to handle.
Main.inner().
ComposedRADService)
DynamicComponent.properties()
LinuxSession.write(String, long, java.io.InputStream))RADService.State"RADService.communication() - Requests -> WebserverProxyPoolRADService.dependency() - DatacenterRADService.composition()
DynamicComponent.properties()
ConfigWriter)RADService.State"RADService.dependency() - VMComposedRADService)
DynamicComponent.properties()
RADService.State"RADService.composition()
LigHTTPDLinux - Could be many instancesDynamicComponent.properties()
ConfigWriter)RADService.State"RADService.dependency() - VMRADService.communication() - Requests -> DispatcherProxyPoolComposedRADService)
DynamicComponent.properties()
RADService.State"RADService.composition()
RoRServerLinux - Could be many instancesDynamicComponent.properties()
ConfigWriter)RADService.State"RoRServerLinux.rorNumDispatchers()RADService.dependency() - VMRADService.communication()
ComposedRADService)
DynamicComponent.properties()
RADService.State"RADService.composition()
MySQLFedora - Could be many instancesDynamicComponent.properties()
ConfigWriter)RADService.State"RADService.dependency() - VMComposedRADService)
DynamicComponent.properties()
RADService.State"RADService.composition()
MemcachedFedora - Could be many instancesDynamicComponent.properties()
ConfigWriter)RADService.State"RADService.dependency() - VMVMWareLinux
DynamicComponent.properties()
RADService.State" - VMWare Power State"Nagios.CurrentLoad", "Nagios.NumProcs""Nagios.PercentMemUsed"RADService.dependency() - LinuxSystemLinuxSystem
DynamicComponent.properties()
RADService.State""Nagios.CurrentLoad", "Nagios.NumProcs""Nagios.PercentMemUsed"RADService.dependency() - DatacenterComposedRADService)
DynamicComponent.properties()
RADService.State"RADTools
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||