PageBox: application deployment with publish and subscribe on J2EE, JES2, PHP and ASP.NET PageBox
PageBox site Cuckoo Java PageBox Pagebox for PHP PageBox for .NET PageBox for Java Resume

Rationale

 

Foreword

PageBox project aims to provide components allowing distributing Presentations on Internet and Intranets. In this document, we present:

Definitions

  1. Presentation

    We define a presentation here as a set of components able to generate and format what is displayed on a user equipment.
    A presentation calls content producers to get the content that it processes.

    A presentation as described here handles only HTTP/HTTPS though an intermediate gateway can convert from HTTP to another protocol such as WAP. A presentation can generate a flow in miscellaneous formats such as XML, HTML, XHTML, WML, PDF, SVG and SWF.

    A presentation calls a content producer using a network protocol such as XML over HTTP, SOAP or IIOP. It can have a function of content adaptation.

  2. PageBox

    PageBox is a mean enabling the hot deployment and update of presentations in Application Servers.
    A PageBoxed Application Server (PAS) behaves as a browser, it downloads a presentation from a repository just like a browser downloads an applet and like a browser a PAS runs the presentation in a sandbox with rights based on the presentation signature.

    Unlike a browser, a PAS downloads or updates a presentation only when commanded through an HTTP request.
    PageBox has been designed for Internet deployment and for Application Service Providers (ASPs).

  3. Repository

    A PageBox can subscribe to one or many repositories.
    When a new archive is published on a repository, it is automatically deployed on all PageBoxes that have subscribed to the repository.

  4. Constellation

    A constellation is a set of PageBoxes and repositories with a trust relationship.
    Whereas PageBoxes and repositories have a physical existence (they are hosted somewhere and run PageBox code) a constellation is an organizational entity. A constellation can be characterized by:

    • A list of repositories
    • A common security mechanism allowing subscription and publication checking
  5. Mapper

    The mapper is a mechanism embedded in PageBox that modifies links in specified static pages (HTML, XHTML), named routing pages.
    A user enters a presentation when it specifies a well-known URL, handled by one PageBox or a small set of clusterized PageBoxes.

    When the user clicks on a routing page, links can be modified to:

    • Select a PageBox closer to the user
    • Load-balance requests between PageBoxes

Analysis

  1. Performance

    Today to offer a Graphical User Interface a company must either:

    1. Write a Web application or
    2. Write a graphical front end

  2. Web application

    Advantages:

    • A Web application is easier to write and to maintain than a graphical front end. It also requires less skill.
    • A Web application is a central application, so it is easy to deploy and update.

    Drawbacks:

    • A Web application being a central application also means all application parts, presentation, business logic, data caching and accesses run on a small set of servers. Large server resources (memory, CPU and disks) are more expensive than small computers ones
    • Browsers are used to display Web application pages and these pages are downloaded using HTML or XML over HTTP. Here the main drawback is that presentation is downloaded with data. As a consequence Web applications require more bandwidth than applications invoked by graphical front ends

    Web Applications are successful and address well End Consumer market where availability and response time requirements are lower. The End consumer doesn’t pay nor is paid to use the application but we think the major point here is she or he is an occasional user. Compared to a Professional User, she or he is still a beginner and therefore slower.

  3. Graphical front end

    Advantages:

    • From a communication point of view, a graphical front end is the client part of a client/server application. It can use client/server protocols such as EJB over RMI/IIOP, which carry only data and require less bandwidth than Web applications
    • It runs presentation on the client and requires less resources on server where they are expensive

    Drawbacks:

    • A graphical front end is harder to develop and to maintain. It is more demanding in project management and developer skills. This complexity doesn’t accommodate time to market and frequent changes constraints
    • A graphical front end is hard and expensive to deploy and update on a large number of devices

  4. A third way: PageBox

    The company still write a Web Application but the Web Application is deployed automatically on a large number of inexpensive PageBox hosts. When a user makes a request it is routed to a PageBox on her or his side or close to it. As a consequence:

    • The graphical application is easy to write and update as it is a regular Web Application
    • Deployment is the responsibility of the infrastructure. From the company point of view it is no more than a repository publication.
    • As the PageBox is close to the user, the response time is better. We believe we can achieve a consistent sub-second response time.
    • PageBoxes run on inexpensive platforms
    • The bandwidth requirement is the same as for a graphical front end

  5. Existing infrastructures

    Such infrastructures exist today for static, non-customizable content.

    Commercial infrastructures

    A good example is Akamai. We recommend reading their white papers on the issue.
    Another good example is Inktomi. They have an excellent Flash presentation of the subject and also white papers. You can find them here.

    Components

    Examples of components are proxies aka Web caches such as Open Source Squid.
    Here a set of users configure their browser to use a proxy. They share the proxy, which means that if user A has downloaded a page, user B that asks for the page later is served by the proxy and no more by the HTTP server.

    Proxies can run in routers. They can cooperate with other proxies and be used to build caching infrastructures.

    The analysis of current offer suggest we must consider three aspects:
    • Providing a component set allowing the distribution of presentations
    • Implementing infrastructures we call constellations for presentation deployment
    • Interfacing with static content existing solutions. A solution optimized for static content will always handle better static content than general purpose PageBox.

    Internet

    When we started working on PageBox concept, we were focused on this technical/performance problem.

    Now we think that there is another aspect to consider. Internet used to be a network, a pipe that linked clients and servers hosted on its border. Today numerous ASPs offer Web Hosting. An advantage of Web Hosting is ASPs have bigger links than most companies. In some cases they are also ISPs. We can see them as a part of the infrastructure and note that Internet is becoming an added-value network.

    Therefore we see PageBox as a technology enabling Internet to host presentation. Let's see why it is possible and why it is good.

  6. Availability of mature PKIs

    We have the following security needs:

    1. The repositories must be able to check the identity of the Presentation providers and of the subscribing PageBoxes.
    2. The PageBoxes must be able to run the Presentations in a sandbox and to check the identity of the repositories.
    3. The Presentations must be able to check the identity of the data providers and - as usual - of the users.
    4. The data providers must be able to check the identity of the Presentations.

    We can address these needs with data providers, Presentation providers, PageBox hosts and repositories managed by independant organizations thank to Public/Private key infrastructure and independant Certificate Authorities such as Verisign.

  7. Move to standard protocols

    XML over HTTP, XML over SMTP, SOAP and IIOP have free interoperable implementations. It should also be the case of the coming XML Protocol (XMLP)
    The Data Providers can publish their interfaces using DTDs, W3C schemas and IDL files. Then their applications become Web Services.

    XML over HTTP (synchronous) and XML over SMTP (asynchronous) is the most promising solution:

    • HTTP and SMTP are firewall proof
    • HTTP over SSL is free, mature and available on all environments
    • SMTP is ubiquitous
    • Non-repudiation can be easily implemented using a digital signature encrypted using the free JSSE

    RosettaNet Implementation Framework (RNIF) and ebXML are good examples of protocols using XML over HTTP or SMTP. Messages have this format:

    XML protocol

    XML protocol

    SOAP and XMLP provide standard APIs to generate and parse such messages.

    Web service descriptions can be stored in and retrieved from a Universal Description, Discovery and Integration (UDDI) repository. Note that queries to UDDI themselves use SOAP. UDDI is important for presentation providers. It allows:

    • Listing possible data providers
    • Finding how to invoke chosen providers

    UDDI is defined to allow different technologies working together. It supports describing any socket, IIOP, XML over HTTP binding. However today only SOAP and HTTP binding are standardized with Web Service Description Language (WSDL). You can download its specification on IBM or Microsoft sites

  8. Data source/data presentation link

    Today the company that owns the data also provides the presentation allowing accessing them.

    We think that it is bad because:

    • It hinders competition. Customers have to use the presentation of the data provider. The other drawbacks we enumerate below are usual drawbacks of markets without competition.
    • Data providers become two head companies instead focusing on their core business.
    • There is no incentive to improve processes. The data providers get money when users decides to pay its shopping cart whereas every customer action such as availability has a processing cost, often higher than the final purchase.
    • The end consumers don't get a fair share of the process enhancement.

    Big data providers

    Producing large amount of data requires time and money. Big data providers have computers for a long time and often still produce most of these data on mainframes. So their presentation are already PageBox-like presentations:

    • They call other systems to get or update data.
    • Therefore they manage only few data on their own.

    Big data providers would spare money by focusing on the delivery of gateways handling industry-standardized XML messages.

    Small data providers

    Small data providers case is different: their application server relies on databases and only marginally calls other systems.

    They have two problems:

    1. To match the competition, they have to invest more and more on Presentation instead focusing on their core business
    2. As they have few data to offer, they cannot compete with big data providers.

    Presentation providers can merge data of many small providers to build a comprehensive offer and address the second issue.
    Small data providers can use their existing application server to handle XML data and address the first issue.

    Presentation providers

    What we said about data providers is not related to PageBoxes: data providers exist and are already moving to XML to address B2B needs.

    On the other hand, Presentation providers are new:

    • A Presentation provider is primarily a software company as Presentation administration is handled by Constellations.
    • It can specialize on a type of presentation - for instance mobile phones and PDA or Flash and SVG
    • It accesses data providers through their access points using their published message.

  9. This analysis consider long-term perspectives. We think that we needed to do so in order to define the right target and clarify PageBox goals. We hope that it helps to understand the short-term product.

Target and status

  1. Roles

    Data providers

    Data providers provide access to their data in a standard protocol.

    Presentation providers

    Presentation providers write presentations and publish them to Presentation repositories.
    A presentation can call one or many Data providers. It can access its own data but it should be mostly read-only.

    A presentation is deployed on a large number of PageBoxes so a Presentation provider cannot assume a given user will always be served by the same PageBox - but for the duration of an Application Server session.

    Users

    Users access presentations using browsers.

    Presentation repositories

    Presentation repositories accept subscription requests from Presentation hosts and publication requests from Presentation providers. When a Presentation repository receives a Publication request, it notifies subscribing PageBoxes that download the presentation.

    Presentation hosts

    Presentation hosts host Presentations in PageBoxed application servers.
  2. Infrastructure

    Data Provider directories

    Data Provider directories use UDDI protocol and allow Presentation Providers to discover information about Web services:

    • Data Provider access point
    • Data Provider services (messages, functions)

    UDDI has been designed to address Business to Business (B2B) needs. As a Presentation Provider is a regular trading partner, UDDI satisfies PageBox requirements.

    Presentation hosts

    We developed a PageBox named JSPservlet under GPL 2. Its source is on SourceForge site in pagebox repository.

    We distinguish two sorts of Presentation hosts:

    1. "Turn key" hosts

      These hosts are Network Appliances and routers.
      We consider our JSPservlet for Application Servers addresses Network Appliance environment:

      • Linux or similar Operating System
      • Support for free (Tomcat) or inexpensive (Resin) Application Servers
      • Enough resources (256 MB or more)

      As resources are more expensive on routers, we developed a JSPservlet for embedded servers to reduce the footprint and next a diskless version for devices without disks - still running on embedded servers.

      JSPservlet for embedded servers run on Sun Java Embedded Server and can be ported with minimal effort to other embedded servers.

      These embedded servers conform to Open Services Gateway Initiative (OSGi) specification. You can download the specification on OSGi site. An interesting aspect is that it primarily targets home gateways, which can be of interesting for specialized Presentation providers.

      For instance, a game editor could use PageBoxes to host Web archive based games. In that case, the consumer subscribes to the editor repository and its home PageBox downloads games from the repository.

      An interesting aspect of our implementation is that it supports standard Web Application including JSP 1.1 tag libraries and JSP beans. The only thing we failed to hide is that JSPservlet for embedded servers inherit from JES 2 the support of Servlet specification 2.1 - not 2.2 as on Application Servers.

    2. Application Server hosts

      Application Server hosts are hosts that already operate Application Servers.
      We distinguish three sorts of Application Server hosts:

      • ASPs

        We see them as our main short-term target as they make the creation of constellations relatively inexpensive.

        A company could be charged something like $650-1500 a month for a world-wide constellation of fifty PageBoxes - roughly 1/3-1/2 of the hosting cost of a dedicated server. At this cost, it would get an highly redundant solution and provide a better response time.

        ASPs that support Java use JServ (an Apache module), Tomcat and Resin. It is for that reason we developed JSPservlet on Tomcat and Resin. We consider a JServ version though it will have the same drawback as embedded server version - support of an old servlet specification -.

      • Organizations such as universities

        We believe these organization mostly run the same Tomcat and Resin as ASPs and therefore JSPservlet for Application Servers should meet their needs. We think it could be interesting to create cross-university constellations to reduce network bills and enhance response time.

      • Companies

        Company use Tomcat and Resin but also commercial application servers. As we explain below, JSPservlet is designed to be highly portable. If you meet some problem to port JSPservlet to your Application Server, please contact us. On the other hand if you made the port let us know and if possible send us the code.

        Companies can use PageBoxes in four ways:

        1. Using a public constellation.
        2. Creating a private Internet constellation.
        3. Creating a private network constellation.
        4. Using standalone PageBoxes. Consider a simple B2B issue: Company A has developped a Web Application, it sells to company B. Company A keeps the data access part on its side and deploys the presentation part on a PageBox hosted in Company B. The return on investment can come quickly if the companies would need to update their link otherwise.
          It is enough if Company B is a single-site medium or small company.
          If Company B is a corporate with 2000 offices around the world, then it is of its interest to deploy a private constellation and to allow Company A to publish its presentation on Company B repository.

    Today JSPservlet is a regular Web application. Though a closer integration could have some value, we found it has significant advantages:

    • Application Server provide support for servlet and JSPs.
    • It is easy to port JSPservlet to another application server.
    • It is easy to install JSPservlet on a machine where an application server is installed.

    Users don't need to know the URL of the closest PageBox. Thank to Mapper component they can enter the well-known URL of their application hosted in a PageBox. Then they are automatically routed to one of the closest Pageboxes. The mechanism identifies safely:

    1. The country
    2. If there are PageBoxes on the same domain as the requestor.

    Mapper can be customized to suit more closely to a specific need.
    We plan to complement Mapper with the integration with a Web cache - Squid.

    Repositories

    We implemented a comprehensive repository tool, PublisherServer. It accepts:

    • Subscription requests in HTTP
    • Publication requests in HTTP, using a Java application, PublisherClient
    • A set of administrative pages to check the state of the subscriptions and publications and to display repository archives
    • Fault-tolerant mechanisms to retry publication, unpublication and unsubscription when a PageBox is broken

    PublisherServer runs in an Application Server.
    We plan to implement access control functions. We favor simplicity of use, so we probably will use a password mechanism.

    Constellations

    We plan to implement a test constellation. Any help is welcome but we would especially appreciate free hosting and even more free hosting in Europe, Asia and Pacific.

    Security

    From the beginning we considered security as one of the most important issues PageBox had to address. We have implemented sandboxes and SSL support. We have still some features to implement such as secured publishing but without technical risks.

    The most important point is our implementation can and should be able to check archive credentials on a Certificate Authority. More specifically, PageBox should be configured to query an LDAP CA to check that:

    • The archive certificates were signed by a trusted CA
    • The archive certificates were not revoked

    It implies a cost probably negligible for commercial use. For test purposes on the coming Ursa Minor constellation, we plan to support test certificates issued by Verisign and Thawte.

    It would be useful to have our own Certificate authority for a public, cheap or free constellation. There are some issues. One of them is a certificate is only valid if the CA can check the identity of the requestor. Mail address is not an option. A credit card number is... We would be happy if someone could help us to provide a free or cheap CA.

Goals

  1. Standard conformant
  2. Public domain reference implementation

    This reference implementation must also leverage on Public domain products such as Tomcat or Cocoon and avoid competing with other Public domain products.

  3. Secure
  4. Reliable
  5. Cheap

    We try to reduce cost in three areas:

    1. Setting cost. We already developed helper servlets.
    2. Security management. This part is not completed but our goal is to provide an automated security management.
    3. Troubleshooting. We already provide log display through HTTP.
  6. Thin and fast

    For the moment we have been successful in that area. PageBox code is small and has no significant overhead even when it sandboxes Presentations.

Technology

    In the current implementation, JSPservlet we use Java language and J2EE technology. We are very satisfied for the following reasons:
    • Functionality. We especially used:
      • Class loaders
      • Java 2 security
    • Productivity
    • Portability
    • Availability of robust and scalable application servers in Open Source

    Our conviction is PageBox has to be written in some form of VM-run or managed code.
    The only appropriate environment - beside Java - is .NET and we consider this option seriously to support ASP+/VB/C# development. We also want to support PHP development.

    We have checked that it supports:

    • Java servlets and JavaServer Pages, taglibs, beans and classes
    • XML, XSL, XSP with Cocoon, Xerces and Xalan libraries

    It should also support the same scripting languages as Bean Scripting Framework (BSF), Netscape Rhino (Javascript), VBScript, Perl, Tcl, Python, NetRexx and Rexx...

AS doc JES2 doc Diskless doc Publisher doc Publisher client doc Configurator doc
Doc & downloads CVS repository
Contact:support@pagebox.net

©2001-2004 Alexis Grandemange  
Last modified