PageBox story and future
Story
The beginning
The original idea
The author had to work on the design of a large Internet.
The offices had LAN but between the offices and the central systems there were only slow frame relay links.
We used a thin client/intermediate server solution:
- Agent PCs running browsers
- Office servers running a Web server with only 3 tasks
- Handle presentation
- Maintain reference data
- Invoke central system applications
We were using HTTP on the LANs and more efficient protocols on the WAN to spare bandwidth.
The solution had only a drawback: we had to manage a large number of office servers.
PageBox was designed to address this drawback.
- PageBox leverages on the environment. To run Web application you need Web servers.
Web servers are a proven and scalable way to handle requests and file transfers.
They use a firewall-friendly protocol, HTTP that can be easily secured with SSL (HTTPS).
- PageBox is a Web application whose function is to install other Web applications
- PageBox uses a Publish and Subscribe protocol.
The PageBox administrator subscribes to a Central Repository to get Web applications.
Developers publish their Web Applications on the Repository.
Then the Repository, which is also a Web application deploys the published application on subscribers PageBoxes.
Implication and issues
The model can have strong implications, especially when combined with Web services.
Let's assume that we have 10000 or more ASP sites hosting PageBoxes and hundreds repositories.
Let's assume that you develop a Web application with its own database and that invokes Web services.
Some of these Web services are delivered by other companies, for instance for geo-location or for payment.
You write and host the other Web services, for instance for accounting or troubleshooting.
You can easily process your Web services requests with a cluster of inexpensive machines
because they are just simple stateless requests and you only need to publish your Web application
on the repository of your choice.
The model is fault tolerant and highly scalable.
It has however a couple of drawbacks:
- Security.
A PageBox host must trust the Repositories and the Repositories must trust the publishers.
- The end user don't know which PageBox to contact. You need a mechanism such as
- A modified DNS that computes the PageBox IP address or
- A cache that forwards the request to the right PageBox
It was beyond our capabilities and possibly beyond what we can do within an Open Source model.
Constellations of PageBox and repositories are cheaper that server farms when we have to process thousands of requests
per second because we share resources and because of the fine granularity. If the traffic grows by five percent we
just need to add one or two PageBoxes. It works like cellular phones. If we have more users somewhere we install a few
more cells.
However constellations cannot be free.
- For the security, we need trustable certificates, which implies a registration process.
The process is mature but people have to pay for that.
- PageBox hosting costs money. We want a model where PageBox hosts, Repository hosts and publishers can be
different organizations. The publishers get money from the customers of their Web applications. They are charged
by Repositories, which are themselves charged by PageBox hosts based on the usage.
We however proved that
- The distribution mechanism could work
- The troubleshooting was feasible. We could monitor the activity and properly interpret end-user calls.
Companies such as Akamai already use the sort of modified DNS that we describe above.
Internet
Internet is like these heritage trains with as many people in the steam engine as in the wagons:
There are roughly 600 millions users and 40 millions sites and only few sites serve more than one
page per second. Internet is optimized for that pattern. There are sophisticated mechanisms such
as RPSL allowing the Internet entities (called Autonomous Systems) to exchange fine-grained routing information.
As you can see with the traceroute command, your requests go across (roughly) the same number of routers
regardless of the site location.
Because the Internet delay time is spent mostly in routers and primarily depends on the number of routers,
the benefit to serve a request from the requestor state or country rather than from a single worldwide location
is almost negligible and becomes significant (~100ms) only if the request is served from the network of
the requestor access point.
Therefore today PageBoxes are the most effective when installed on user LANs or Intranets and
ASP are good mainly for Repository hosting.
LAN PageBox
- A company can run a PageBox on its Intranet or LAN just as it runs today a proxy server.
The PageBox installs and updates professional Web Applications from different repositories.
The company has a better response time and needs a smaller link to its ISP.
- A software development company can publish its applications on a PageBox repository.
The company has less servers to operate and less traffic.
The response time and bandwidth improvement depends on how well the Web application can handle
user request locally, using reference data, caching and local database.
PageBox is also useful when combined with a Portal or Webtop:
Rather than calling a remote application, the Portal can call a local copy installed by PageBox.
Another idea is the PageBox control: A PageBox control is a Graphical front end to the Web service.
Let's assume that a company A uses a Web service offered by company B.
Today when the company B changes its Web service definition, company A has to update its code.
Let's assume now that company A subscribes to the PageBox control repository.
When company B changes its Web service definition, it also publishes the updated PageBox control,
which is automatically installed on company A Web site.
Cuckoo
The author had to work on a Content Management System.
Because writers responsible to create content did only know Word,
the author wrote a tool to convert Word to the format (mostly XML) expected by the CMS.
Though it required writers to use templates and styles, the tool turned to be effective and flexible.
CMSs enforce rules that give consistency to the content even if there are hundred and more authors
but with their RDBMS, meta-tags and complex structure they can't suit all needs.
Cuckoo was invented to explore an alternative design
- Where you have more freedom in the design of the content
- But where you can apply the same layout and style to all pages,
which is badly needed even on the smallest sites
Ptah
Ptah has been developed because the author needed
a free-OpenSource HTML map generator, a simple tool that could be modified to generate any kind of map.
Ptah can be used in combination with Cuckoo. It is written in Java/Swing.
PageBox for PHP
PageBox for PHP was designed
- To react against the increasing complexity of Java PageBox
- For smaller shops and cheapest ASPs
PageBox for PHP reused the Publish & Subscribe model of Java PageBox.
The PageBox for PHP repository has more functions and is server-side only.
The design is more modular. There is a clean distinction between:
- The distribution process
- The monitoring system responsible to retry the deployment
- The installation process
- The support libraries, yet to write and that could used outside PageBoxes
PageBox for .NET
PageBox for .NET design was based on the design of PageBox for PHP.
Thank to the .NET libraries, it was possible:
- To use XML for configuration
- To use SOAP Web services for deployment and retry
- To implement the monitoring system as a Windows (NT) service
PageBox for Java
PageBox for Java design reuses the PageBox for .NET design plus:
- A more scalable deployment model described in the Grid API V2.
You can find details about this model in the
Deployment with relays section.
- A delta deployment using the jardiff format described in the JNLP specification.
Only the difference between the current version and the installed version is sent to the target PageBox.
- An installation API allowing fully automated deployments and updates.
- A better security model.
The reference version of PageBox for Java runs on Java Web Services Developer pack (WSDP) and on Tomcat/Axis.
PageBox for Java can be easily ported
to other Application servers that support:
- JavaServer Page (JSP) specification 1.2 to 2
- Servlet specification 2.3 or 2.4
- JAXP
- JAX-RPC
- JSTL
- COS, the com.oreilly.servlet package written by Jason Hunter for the Web archive upload (MultipartRequest class)
Reservation
Reservation was developped as an example of application distributed with PageBox.
It is:
- A real application
- With a XML configuration
- Making database requests to a local database
- Making Web service requests to other Reservation instances
When it is distributed with PageBox, an Reservation instance can retrieve other Reservation instances,
deployed from the same repository.
Reservation is a ASP.NET application written in C#.
Active Naming
The PageBox Active Naming is based on an idea implemented in WebOS.
For more information you can read the PhD dissertation of
Amin Vahdat.
This idea is to interpose programs behind the naming interface.
This approach has several benefits compared to the existing DNS:
- Support for Load balancing, migration, replication, fail over and caching
- Minimization of the latency and consumed wide area bandwidth
Active Naming is implemented in PageBox as a Web service that returns a list of candidates:
Candidate[] GetCandidate(String repository_URL,
String WebService_archive, String WebService_name);
The client provides:
- The URL of the repository that manages the Web service to call
- The name of the Web service archive - the archive used to deploy the Web service application
- The name of the Web service
The Web service returns an array of Candidate objects. The Candidate class provides
- The URL of a Web service instance
- An object that can help choosing a Web service instance
Each Web service instance is registered to take care of a specific region (location-dependent routing) or of a given
range of requests (data-dependent routing) by setting such an object.
The client finds the best matches with its own location (location-dependent routing) or with the parameters of the
Web service call (data-dependent routing). Then it selects one of the Web services instance (for instance using round
robbin) and when the service invocation fails it can retry with another instance.
The object helping choosing a Web service instance is whatever is defined by the Web service author.
Because the ActiveNaming Web service returns an array the client can keep it in a cache.
Grid API
The PageBox Grid API uses some ideas coming from Grid computing and especially from the Message Passing Interface (MPI).
For more information you can read our introduction to Grid computing here.
The PageBox Grid API doesn't aim to address high performance computing needs but to address common Web application
needs such as cooperative computing, cache synchronization and data replication.
The PageBox Grid API supports
- Classical Send/Receive
- Collective operations, Scatter and Gather.
It is possible to send an object to all other Grid partners or distribute an array of objects to other partners.
In the latter case the Grid API will send objects to as many partners as needed (when there are less objects to
process than partners) or distribute evenly the objects among the partners (when there are more objects to process
than partners).
The PageBox Grid API supports three transport mode, in memory across threads, UDP and SMTP.
Collective operations allow optimizations such as Multicast in UDP mode or sending a mail to all partners in SMTP mode.
The PageBox Grid API is also optimized for a use in multithreaded environments.
Coordinator
The Coordinator is a small and fast API allowing PageControls to talk
to each other. Polaris that we present below demonstrate the use of the Coordinator API.
Token API
The Token API was implemented first on PageBox for Java.
For Web applications distributed on a large number of machines enabled by the deployment with Relays a token API is
more useful than a Grid API because
- it minimizes the number of messages sent;
- it addresses elegantly data replication needs, which may be the most important.
The token implementation works that way:
- The Repository send a frame.
- The first PageBox extracts from the frame the messages for the Web applications this PageBox is hosting.
- The first PageBox adds to the frame the messages issued by the Web applications this PageBox is hosting.
- The first PageBox fdorwards the frame to the next PageBox.
- The next PageBoxes do the same.
- The last PageBox processes the frame like the other PageBoxes but forwards the frame to the Repository.
- The Repository makes some housekeeping and sends again the frame.
The implementation of the Active Naming of PageBox for
Java uses the Token API.
Current products
Java PageBox
The initial Java version. It has been tested with Tomcat 3.2 and Resin 2.
It supports Cocoon 1.8 and SOAP applications. You can find more information about it
here.
PageBox for PHP
PageBox for PHP has been tested with PHP 4.0.6 and PHP 4.1.0. It should work with PHP 4 and above.
PageBox for PHP 0.0.4 and above implement SOAP Web services.
You can find more information about it here.
PageBox for .NET
PageBox for .NET has been tested with .NET beta 2.
You can find more information about it here.
PageBox for .NET version 0.0.5 and above implement the ActiveNaming Web service and the Grid API.
PageBox for Java
The new Java version has been tested with Tomcat 4.1 and Tomcat 5 with Axis 1.1 and with JWSDP 1.2/1.3.
You can find more information about it here.
PageBox for Java 0.0.12 and above implement the Token API and a advanced version of the Active Naming.
This version implements a installation facility and is extensible: Web applications can implement their own
installation class and PageBox users can develop their own protocols or add extensions.
Cuckoo
Cuckoo is a Word plug-in that generates XML files and applies a stylesheet to allow reviewing the converted document.
Cuckoo allows applying the same stylesheet to all pages of a site, which means that
- All pages have the same layout
- When you change the layout, you only make the change at a single place
Cuckoo also allows merging the XML output of multiple Word documents on a single HTML page.
The current version is stable. We write most of our documentation with it and we spare time.
Now Cuckoo supports non-latin languages as much as Word.
You can find more information about it here.
Reservation
Reservation is an example of application for franchise and small business.
You can find information about the rationale here.
GoogleControl
GoogleControl is an example of PageBox control.
It invokes the Google API.
You can find information about the PageBox Control concept here.
Polaris
Polaris is a simple application developed to test and illustrate the ActiveNaming and Grid functions.
Polaris is made of two parts:
- A client part implemented as a HTTP control
- A server part that implements a Web Service and uses a database
There are two versions of Polaris:
- Polaris A accesses the database in read-only mode.
This version uses the ActiveNaming Web service to implement data-dependent routing
(the target Web service is chosen after the request parameters) and location-dependent routing
(the target Web service is chosen after the requestor location).
Polaris A also implements load balancing and is fault tolerant.
- Polaris B writes on the database.
The server part of Polaris B uses the Grid API to replicate the database updates onto other server instances.
The control part of Polaris B caches the results and uses the Grid API to synchronize the cache with other
control instances. Polaris B is a superset of Polaris A.
Polaris is under development. For more information you can go to
http://pagebox.net/polaris/pol-index.html.
Pandora
Pandora is an example of distributed Web application deployed with PageBox for Java.
Pandora contains three types of Web applications:
- a distributed Web application to be deployed in many locations and able to keep orders up to the time
a central Web application is available;
- a central web application where orders are ultimately processed; This central application issues
payment and delivery requests to payment and delivery Web applications;
- a payment and delivery mockup. When used as a delivery application the mockup mimics an application that
handles good delivery. When used as a payment application the mockup mimics the credit card processing.
Pandora implements an installation class that populates a database.
Pandora uses a PageBox API to access and update this database.
Pandora also implements a Trusted Web site support allowing
securely delegating the user authentication to other Web sites.
Epimetheus
Epimetheus is an example of Web application deployed with PageBox for Java.
Epimetheus maintains contact information.
Epimetheus illustrates the use of the PageBox API to
- access the Application servers resources;
- use extensions - in this case to access the serial and parallel port of the host.
EuroLCC
EuroLCC is an example of Web application deployed with PageBox for Java.
EuroLCC lists low cost carriers and airport codes and allows finding routes served by these low cost carriers
between these airports.
EuroLCC illustrates the use of a generic installation class
that populates a relational database with any kind of data.
Prometheus
Prometheus is an example of distributed Web application deployed with PageBox for Java.
Prometheus implements
- a simple chat application to illustrate the use of the Token API;
- a redirection facility to redirect page requests to the most suitable Prometheus instance using
the Active Naming.
Roadmap
Generally speaking,
- We develop products to facilitate the deployment of applications on the Web
- We document them
- We try to innovate. If something already exist we use it
- We believe in Internet and Open Source but we are vendor agnostic
August 2002 plan
We wrote then:
- We will continue to develop and maintain PageBox for PHP, PageBox for .NET, Reservation and Cuckoo
- The development of Java PageBox is stopped.
We will keep its documentation and sources online for your convenience
but we will write a new version called PageBox for Java based like PageBox for .NET on PageBox for PHP.
- PageBox for Java should be interoperable with PageBox for .NET
- We will develop new examples of PageBox applications
- We created a page about software design and algorithms.
We plan to release a couple of implementations suitable for Web applications
- We will continue to develop the ActiveNaming and the Grid API concepts.
The ActiveNaming and Grid API allow creating truly scalable and fault-tolerant application because:
- The client can always choose among many servers
- The data can always be replicated
With the Grid API V2 we plan to use the Grid API for deployment.
With Grid V2 some PageBox would act as relay for a faster and more scalable deployment.
Another idea is to use the Grid API to replicate repositories that could be deployed like regular PageBox applications
from meta-repositories.
- With the Installation API:
- We will send only the delta between the deployed version and the previous one
using a combination of VCDIFF (RFC 3284) and JARDiff (JNLP).
- The installed Web application will be able to get the URL of the installing (controlling) PageBox
- The controlling PageBox will call a class of the Web Application to perform the post-installation tasks such as
populating a database
- The PageBox administrator will set the installation directory and the rights that it wants to grant a repository
or a Publisher
To summarize the installation process will be safer and more automated.
What has been done
- We focused on PageBox for Java. We implemented this version with the installation API, a distribution with
relays based on the Grid API v2 and a delta deployment using JARDiff. We also made this version extensible.
As planned we created new examples, Pandora, Epimetheus, EuroLCC and Prometheus.
- We developed for this version a Token API and a better Active Naming.
- We wrote new documents, a graph introduction, a
presentation of Kalman filters, a
presentation of the patent system, a
presentation of the air transport industry, our
view of Society and Computing and an
introduction to human networks.
Next steps
In the coming months we will focus on making the PageBox version more robust (beta and then release level).
We also to also include control and maybe Cocoon support as explored in the .NET and Java PageBox versions.
If you want to help us or if you have comments you can contact us at
support@pagebox.net.
AS doc JES2 doc
Diskless doc
Publisher doc
Publisher client doc
Configurator doc
Doc & downloads
CVS repository
Contact:support@pagebox.net ©2001-2004 Alexis Grandemange.
Last modified
.
|
|