Archives mensuelles : juin 2005

Prospective sur la demande sociale de recherche

Demain, notre société devra faire face à des enjeux pour lesquels un effort de recherche scientifique et technologique sera nécessaire. Quels seront ces enjeux ? Quel sera l’effort de recherche nécessaire ? C’est à ces questions qu’essaie de répondre le programme « Agora 2020 » du centre de prospective et de veille scientifique du ministère de l’équipement.

Innovation industrielle, et Internet dans tout ça ?

Le rapport de Jean-Louis Beffa à Jacques Chirac a donné lieu à la création d’une agence pour l’innovation industrielle doté d’un joli budget. Ce rapport a été discuté dans la blogosphère.

Certains ont été notamment surpris de l’absence d’un axe prioritaire « Technologies de l’Information et de la Communication » dans ce rapport et ont souligné combien d’autres pays avaient au contraire misé sur l’innovation dans les TIC, les STIC (Sciences et …), les NTIC (Nouvelles… ce qui fait déjà un peu ancien).

Histoire d’apporter ma pierre à l’édifice de la critique (constructive), voici deux documents qui soulignent l’importance prioritaire que les TIC devraient avoir dans une politique d’innovation industrielle en France.

Le premier est un rapport d’étude du conseil stratégique des technologies de l’information auprès du premier ministre, portant sur les politiques de R&D sur les STIC dans les grands pays industriels. Il montre que l’Europe est largement en retard par rapport au Japon et aux USA en matière de R&D sur les NTIC.

Le deuxième document est le bulletin de juin 2005 du centre d’analyse statistique du Canada, portant sur l’innovation. Il indique :

Les résultats de l’Enquête sur l’innovation de 2003, qui portait sur l’innovation dans certaines industries de services, montrent que les établissements des industries de services des TIC sont les plus susceptibles d’être innovateurs. Au Canada, les trois industries où les taux d’innovation étaient les plus élevés appartenaient toutes aux TIC.

En l’occurence, il s’agit des éditeurs logiciels, des opérateurs satellite ou Internet et, dans une moindre mesure, des SSII et du conseil, des bureaux d’études, sociétés d’ingénierie ou de R&D et, enfin, des grossistes-distributeurs high-tech. Il me semble donc que l’agence française pour l’innovation industrielle néglige l’innovation dans l’industrie des services en ne prévoyant aucune priorité politique pour les TIC alors que, dans des pays tels que le Canada, les TIC sont perçues comme un secteur prioritaire d’innovation. On dit que les grands capitaines d’industrie, tels que M. Beffa, ne voient parfois dans l’informatique qu’un « mal nécessaire » (à la bonne gestion, notamment financière, des industries « lourdes »). Ceci explique-t-il cela ?

PS : Au passage, dans le document canadien, vous noterez que

les entreprises qui sont situées à proximité d’entreprises rivales ou d’universités ne sont pas plus
innovatrices que les autres de la même industrie, sauf quand la distance est extrêmement courte.

Il est précisé, un peu plus loin :

La proximité avec des entreprises rivales ou des universités semble favoriser l’innovation uniquement lorsque les distances sont très courtes (quelques centaines de mètres). Et même dans ces cas, la proximité n’a des répercussions que sur certains types d’innovations. La proximité étroite avec des entreprises rivales semble favoriser l’imitation plutôt que les innovations originales, tandis que la proximité étroite avec des universités semble favoriser les innovations originales plutôt que les imitations.

Alors, que penser de cet autre volet des politiques françaises de soutien à la R&D, qui passe par le développement de « pôles de compétitivité » censés rapprocher physiquement entreprises rivales et universités ? Le fond a certainement du bon. Mais a-t-on pensé à prescrire une distance limite au-delà desquels le pôle n’a plus de sens ni d’intérêt ?

Les Fondations en France

En France, les Fondations sont des structures qui allouent des moyens financiers à des « causes » diverses. Les Fondations sont des organisations sans but lucratif qui sont sensées être des championnes de « l’utilité publique », de l’ « intérêt général ». Voici quelques pointeurs qui vous permettront de mieux les connaître :

Le bidouilleur mobile a toujours tor

Actuellement, je suis pas mal en déplacement. Aujourd’hui, j’arrive dans une salle informatique en libre-service d’une entreprise et j’essaie de me connecter au Net avec mon portable.

Première chose à faire, trouver de l’électricité car ma batterie est à plat. Pas de problème, je débranche un écran et lui pique sa prise. Je le remettrai en partant comme j’aurais aimé le trouver en arrivant, bien sûr. :)

Ensuite, il me faut du réseau. OK, je vérifie que les postes sont configurés en DHCP et je pique une prise RJ45 au poste devant lequel je suis. Ca marche, mon portable est accepté par le réseau local. L’administrateur réseau m’avait bien sûr donné son autorisation pour cette opération…

Maintenant, passons aux choses sérieuses : accéder au Net. Visiblement, les postes de cette salle ont des Internet Explorer qui sont configurés sans proxy. Effectivement, depuis mon portable, en désactivant le support proxy de mon Firefox, j’accède au Web sans encombre.

Oui, mais comment relever ma boîte aux lettres en POP ? Il y a un firewall qui montre les dents entre le Net et moi. Thunderbird m’insulte en me disant qu’il s’est fait jeter et que le cerbère de service refuse de le laisser sortir du réseau.

Heureusement, en bon bidouilleur mobile, j’ai mon routeur à l’oignon préféré, à savoir TOR. Je lance donc Tor sur mon portable. Celui-ci arrive à se faufiler par-dessus le firewall et m’ouvre un accès au monde extérieur. J’indique à Thunderbird d’utiliser Tor en tant que proxy SOCKS. Pas de problème, me voici sur le Net en POP !

Idem si je veux accéder à la TV ou la radio en ligne avec Winamp, je configure Winamp pour attaquer un proxy HTTP que j’installe sur mon portable (Privoxy). Et ce proxy relaie mes paquets de communication vers Tor en tant que proxy SOCKS.

Plus fort encore, votre firewall bloque les accès Web ? mais vous avez un proxy SOCKS d’entreprise qui peut vous ouvrir votre accès au Net ? Vous pouvez alors « socksifier tor » grâce à un petit utilitaire du style FreeCap et faire ainsi de belles acrobaties de bidouilleur mobile. Par contre, vous veillerez encore une fois, auparavant, à demander à votre administrateur réseau l’autorisation de faire de telles acrobaties. Il se pourrait qu’elles fassent hérisser les cheveux sur la tête d’un responsable local de la sécurité informatique trop zélé. Attention à ne pas vous mettre en TORT !

Bref, Tor, voila un joli joujou pour vous assurer une meilleure connexion au Net même dans des réseaux trop contraints. Bravo à l’EFF.

Daisy vs. Plone, feature fighting

A Gouri-friend of mine recently pointed me to Daisy, a « CMS wiki/structured/XML/faceted » stuff he said. I answered him it may be a nice product but not enough attractive for me at the moment to spend much time analyzing it. Nevertheless, as Gouri asked, let’s browse Daisy’s features and try to compare them with Plone equivalents (given that I never tried Daisy).

The Daisy project encompasses two major parts: a featureful document repository

Plone is based on an object-oriented repository (Zope’s ZODB) rather than a document oriented repository.

and a web-based, wiki-like frontend.

Plone has its own web-based fronted. Wiki features are provided with an additional product (Zwiki).

If you have different frontend needs than those covered by the standard Daisy frontend, you can still benefit hugely from building upon its repository part.

Plone’s frontend is easily customizable either with your own CSS, with inherting from existing ZPT skins or with a WYSIWYG skin module such as CPSSkin.

Daisy is a Java-based application

Plone is Python-based.

, and is based on the work of many valuable open source packages, without which Daisy would not have been possible. All third-party libraries or products we redistribute are unmodified (unforked) copies.

Same for Plone. Daisy seems to be based on Cocoon. Plone is based on Zope.

Some of the main features of the document repository are:
* Storage and retrieval of documents.

Documents are one of the numerous object classes available in Plone. The basic object in Plone is… an object that is not fully extensible by itself unless it was designed to be so. Plone content types are more user-oriented than generic documents (they implement specialized behaviours such as security rules, workflows, displays, …). They will be made very extensible when the next versions of the « Archetypes » underlying layer is released (they include through-the-web schema management feature that allow web users to extend what any existing content type is).

* Documents can consists of multiple content parts and fields, document types define what parts and fields a document should have.

Plone’s perspective is different because of its object orientation. Another Zope product called Silva is more similar to Daisy’s document orientation.

Fields can be of different data types (string, date, decimal, boolean, …) and can have a list of values to choose from.

Same for Archetypes based content types in Plone.

Parts can contain arbitrary binary data, but the document type can limit the allowed mime types. So a document (or more correctly a part of a document) could contain XML, an image, a PDF document, … Part upload and download is handled in a streaming manner, so the size of parts is only limitted by the available space on your filesystem (and for uploading, a configurable upload limit).

I imagine that Daisy allows the upload and download of documents having any structure, with no constraint. In Plone, you are constrained by the object model of your content types. As said above this model can be extended at run time (schema management) but at the moment, the usual way to do is to define your model at design time and then comply with it at run time. At run time (even without schema management), you can still add custom metadata or upload additional attached files if your content type supports attached files.

* Versioning of the content parts and fields. Each version can have a state of ‘published’ or ‘draft’. The most recent version which has the state published is the ‘live’ version, ie the version that is displayed by default (depends on the behaviour of the frontend application of course).

The default behaviour of Plone does not include real versioning but document workflows. It means that a given content can be in state ‘draft’ or ‘published’ and go from one state to another according to a pre-defined workflow (with security conditions, event triggering and so). But a given object has only one version by default.
But there are additional Plone product that make Plone support versioning. These products are to be merged into Plone future distribution because versioning has been a long awaited feature. Note that, at the moment, you can have several versions of a document to support multi-language sites (one version per language).

* Documents can be marked as ‘retired’, which makes them appear as deleted, they won’t show up unless explicitely requested. Documents can also be deleted permanently.

Plone’s workflow mechanism is much more advanced. A default workflow includes a similar retired state. But the admin can define new workflows and modify the default one, always referring to the user role. Plone’s security model is quite advanced and is the underlying layer of every Plone functionality.

* The repository doesn’t care much what kind of data is stored in its parts, but if it is « HTML-as-well-formed-XML », some additional features are provided:
o link-extraction is performed, which allows to search for referers of a document.
o a summary (first 300 characters) is extracted to display in search results
o (these features could potentially be supported for other formats also)

There is no such thing in Plone. Maybe in Silva ? Plone’s reference engine allows you to define associations between objects. These associations are indexed by Plone’s search engine (« catalog ») and can be searched.

* all documents are stored in one « big bag », there are no directories.

Physically, the ZODB repository can have many forms (RDBMS, …). The default ZODB repository is a single flat file that can get quite big : Data.fs

Each document is identified by a unique ID (an ever-increasing sequence number starting at 1), and has a name (which does not need to be unique).

Each object has an ID but it is not globally unique at the moment. It is unfortunately stored in a hierarchical structure (Zope’s tree). Some Zope/Plone developpers wished « Placeless content » to be implemented. But Daisy must still be superior to Plone in that field.

Hierarchical structure is provided by the frontend by the possibility to create hierarchical navigation trees.

Zope’s tree is the most important structure for objects in a Plone site. It is too much important. You can still create navigation trees with shortcuts. But in fact, the usual solution in order to have maximum flexibility in navigation trees is to use the « Topic » content type. Topics are folder-like object that contain a dynamic list of links to objects matching the Topic’s pre-defined query. Topic are like persistent searches displayed as folders. As a an example a Topic may display the list of all the « Photo »-typed objects that are in « draft » state in a specific part (tree branch) of the site, etc.

* Documents can be combined in so-called « collections ». Collections are sets of the documents. One document can belong to multiple collections, in other words, collections can overlap.

Topics too ? I regret that Plone does easily not offer a default way to display a whole set of objects in just one page. As an example, I would have enjoyed to display a « book » of all the contents in my Plone site as if it were just one single object (so that I can print it…) But there are some Plone additional products (extensions) that support similar functionalities. I often use « Content Panels » to build a page by defining its global layout (columns and lines) and by filling it with « views » from exisiting Plone objects (especially Topics). Content Panels mixed with Topics allow a high flexibility in your site. But this flexibility has some limits too.

* possibility to take exclusive locks on documents for a limitted or unlimitted time. Checking for concurrent modifications (optimistic locking) happens automatically.

See versioning above.

* documents are automatically full-text indexed (Jakarta Lucene based). Currently supports plain text, XML, PDF (through PDFBox), MS-Word, Excel and Powerpoint (through Jakarta POI), and OpenOffice Writer.

Same for Plone except that Plone’s search engine is not Lucene and I don’t know if Plone can read OpenOffice Writer documents. Note that you will require additional modules depending on your platform in order to read Microsoft files.

* repository data is stored in a relation database. Our main development happens on MySQL/InnoDB, but the provisions are there to add support for new databases, for example PostgreSQL support is now included.

Everything is in the ZODB. By default stored as a single file. But can also be stored in a relational database (but this is usually useless). You can also transparently mix several repositories in a same Plone instance. Furthermore, instead of having Plone directly writing in the ZODB’s file, you can configure Plone so that it goes through a ZEO client-server setup so that several Plone instances can share a common database (load balancing). Even better, there is a commercial product, ZRS, that allows you to transparently replicate ZODBs so that several Plone instances setup with ZEO can use several redundant ZODBs (no single point of failure).

The part content is stored in normal files on the file system (to offload the database). The usage of these familiar, open technologies, combined with the fact that the daisywiki frontend stores plain HTML, makes that your valuable content is easily accessible with minimal « vendor » lock-in.

Everything’s in the ZODB. This can be seen as a lock-in. But it is not really because 1/ the product is open source and you can script a full export with Python with minimal effort, 2/ there are default WebDAV + FTP services that can be combined with Plone’s Marshall extension (soon to be included in Plone’s default distribution) that allows you to output your content from your Plone site. Even better, you can also upload your structured semantic content with Marshall plus additional hacks as I mentioned somewhere else.

* a high-level, sql-like query language provides flexible querying without knowing the details of the underlying SQL database schema. The query language also allows to combine full-text (Lucene) and metadata (SQL) searches. Search results are filtered to only contain documents the user is allowed to access (see also access control). The content of parts (if HTML-as-well-formed-XML) can also be selected as part of a query, which is useful to retrieve eg the content of an « abstract » part of a set of documents.

No such thing in Plone as far as I know. You may have to Pythonize my friend… Except that Plone’s tree gives an URL to every object so that you can access any part of the site. But not with a granularity similar to Daisy’s supposed one. See silva for more document-orientation.

* Accesscontrol: instead of attaching an ACL to each individual document, there is a global ACL which allows to specify the access rules for sets of documents by selecting those documents based on expressions. This allows for example to define access control rules for all documents of a certain type, or for all documents in a certain collection.

Access control is based on Plone’s tree, with inheritance (similar to Windows security model in some way). I suppose Plone’s access control is more sophisticated and maintainable than Daisy’s one but it should require more investigation to explain why.

* The full functionality of the repository is available via an HTTP+XML protocol, thus providing language and platform independent access. The documentation of the HTTP interface includes examples on how the repository can be updated using command-line tools like wget and curl.

Unfortunately, Plone is not ReST enough at the moment. But there is some hope the situation will change with Zope 3 (Zope’s next major release that is coming soon). Note that Zope (so Plone) supports HTTP+XML/RPC as a generic web service protocol. But this is nothing near real ReSTful web services…

* A high-level, easy to use Java API, available both as an « in-JVM » implementation for embedded scenarios or services running in the daisy server VM, as well as an implementation that communicates transparently using the HTTP+XML protocol.

Say Python and XML/RPC here.

* For various repository events, such as document creation and update, events are broadcasted via JMS (currently we include OpenJMS). The content of the events are XML messages. Internally, this is used for updating the full-text index, notification-mail sending and clearing of remote caches. Logging all JMS events gives a full audit log of all updates that happened to the repository.

No such mechanism as far as I know. But Plone of course offers fully detailed audit logs of any of its events.

* Repository extensions can provide additional services, included are:
o a notification email sender (which also includes the management of the subscriptions), allowing subscribing to individual documents, collections of documents or all documents.

No such generic feature by default in Plone. You can add scripts to send notification in any workflow transition. But you need to write one or two lines of Python. And the management of subscriptions is not implemented by default. But folder-like object support RSS syndication so that you can agregate Plone’s new objects in your favorite news aggregator;

o a navigation tree management component and a publisher component, which plays hand-in-hand with our frontend (see further on)

I’ll see further on… :)

* A JMX console allows some monitoring and maintenance operations, such as optimization or rebuilding of the fulltext index, monitoring memory usage, document cache size, or database connection pool status.

You have several places to look at for this monitoring within Zope/Plone (no centralized monitoring). An additional Plone product helps in centralizing maintenance operations. Still some ground for progress here.

The « Daisywiki » frontend
The frontend is called the « Daisywiki » because, just like wikis, it provides a mixed browsing/editing environment with a low entry barrier. However, it also differs hugely from the original wikis, in that it uses wysiwyg editing, has a powerful navigation component, and inherits all the features of the underlying daisy repository such as different document types and powerful querying.

Well, then we can just say the same for Plone and rename its skins the Plonewiki frontend… Supports Wysiwyg editing too, with customizable navigation tree, etc.

* wysiwyg HTML editing
o supports recent Internet Explorer and Mozilla/Firefox (gecko) browsers, with fallback to a textarea on other browsers. The editor is customized version of HTMLArea (through plugins, not a fork).

Same for Plone (except it is not an extension of HTMLArea but of a similar product).

o We don’t allow for arbitrary HTML, but limit it to a small, structural subset of HTML, so that it’s future-safe, output medium independent, secure and easily transformable. It is possible to have special paragraph types such as ‘note’ or ‘warning’. The stored HTML is always well-formed XML, and nicely layed-out. Thanks to a powerful (server-side) cleanup engine, the stored HTML is exactly the same whether edited with IE or Mozilla, allowing to do source-based diffs.

No such validity control within Plone. In fact, the structure of a Plone document is always valid because it is managed by Plone according to a specific object model. But a given object may contain an HTML part (a document’s body as an example) that may not be valid. If your documents are to have a recurrent inner structure, then you are invited to make this structure an extension of an object class so that is no more handled as a document structure. See what I mean ?

o insertion of images by browsing the repository or upload of new images (images are also stored as documents in the repository, so can also be versioned, have metadata, access control, etc)

Same with Plone except for versioning. Note that Plone’s Photo content type support automatic server-side redimensioning of images.

o easy insertion document links by searching for a document

Sometimes yes, sometimes no. It depends on the type of link you are creating.

o a heartbeat keeps the session alive while editing

I don’t know how it works here.

o an exlusive lock is automatically taken on the document, with an expire time of 15 minutes, and the lock is automatically refreshed by the heartbeat

I never tried the Plone extension for versioning so I can’t say. I know that you can use the WebDAV interface to edit a Plone object with your favorite text processing package if you want. And I suppose this interface properly manages this kind of issues. But I never tried.

o editing screens are built dynamically for the document type of the document being edited.

Of course.

* Version overview page, from which the state of versions can be changed (between published and draft), and diffs can be requested. * Nice version diffs, including highlighting of actual changes in changed lines (ignoring re-wrapping).

You can easily move any object in its associated workflow (from one state to another, through transitions). But no versioning. Note that you can use Plone’s wiki extension and this extension supports supports diffs and some versioning features. But this is not available for any Plone content type.

* Support for includes, i.e. the inclusion of one document in the other (includes are handled recursively).

No.

* Support for embedding queries in pages.

You can use Topics (persistent queries). You can embed them in Content Panels.

* A hierarchical navigation tree manager. As many navigation trees as you want can be created.

One and only one navigation tree by default. But Topics can be nested. So you can have one main navigation tree plus one or more alternatives with Topics (but these alternatives are limited for some reasons.).

Navigation trees are defined as XML and stored in the repository as documents, thus access control (for authoring them, read access is public), versioning etc applies. One navigation tree can import another one. The nodes in the navigation tree can be listed explicitely, but also dynamically inserted using queries. When a navigation tree is generated, the nodes are filtered according to the access control rules for the requesting user. Navigation trees can be requested in « full » or « contextualized », this last one meaning that only the nodes going to a certain document are expanded. The navigtion tree manager produces XML, the visual rendering is up to XSL stylesheets.

This is nice. Plone can not do that easily. But what Plone can do is still done with respect to its security model and access control, of course.

* A navigation tree editor widget allows easy editing of the navigation trees without knowledge of XML. The navigation tree editor works entirely client-side (Mozilla/Firefox and Internet Explorer), without annoying server-side roundtrips to move nodes around, and full undo support.

Yummy.

* Powerful document-publishing engine, supporting:
o processing of includes (works recursive, with detection of recursive includes)
o processing of embedded queries
o document type specific styling (XSLT-based), also works nicely combined with includes, i.e. each included document will be styled with its own stylesheet depending on its document type.

OK

* PDF publishing (using Apache FOP), with all the same features as the HTML publishing, thus also document type specific styling.

Plone document-like content type offer PDF views too.

* search pages:
o fulltext search
o searching using Daisy’s query language
o display of referers (« incoming links »)

Fulltext search is available. No query language for the user. Display of refers is only available for content type that are either wiki pages or have been given the ability to include references from other objects.

* Multiple-site support, allows to have multiple perspectives on top of the same daisy repository. Each site can have a different navigation tree, and is associated with a default collection. Newly created documents are automatically added to this default collection, and searches are limited to this default collection (unless requested otherwise).

It might be possible with Plone but I am not sure when this would be useful.

* XSLT-based skinning, with resuable ‘common’ stylesheets (in most cases you’ll only need to adjust one ‘layout’ xslt, unless you want to customise heavily). Skins are configurable on a per-site basis.

Plone’s skins are using the Zope Page Templates technology. This is a very nice and simple HTML templating technology. Plone’s skins make an extensive use of CSS and in fact most of the layout and look-and-feel of a site is now in CSS objects. These skins are managed as objects, with inheritance, overriding of skins and other sophisticated mechanism to configure them.

* User self-registration (with the possibility to configure which roles are assigned to users after self-registration) and password reminder.

Same is available from Plone.

* Comments can be added to documents.

Available too.

* Internationalization: the whole front-end is localizable through resource bundles.

Idem.

* Management pages for managing:
o the repository schema (the document types)
o the users
o the collections
o access control

Idem.

* The frontend currently doesn’t perform any caching, all pages are published dynamically, since this also depends on the access rights of the current user. For publishing of high-trafic, public (ie all public access as the same user), read-only sites, it is probably best to develop a custom publishing application.

Zope includes caching mechanisms that take care of access rights. For very high-trafic public sites, a Squid frontend is usually recommended.

* Built on top of Apache Cocoon (an XML-oriented web publishing and application framework), using Cocoon Forms, Apples (for stateful flow scenarios), and the repository client API.

By default, Zope uses its own embedded web server. But the usual setup for production-grade sites is to put an Apache reverse-proxy in front of it.

My conclusion : Daisy looks like a nice product when you have a very document-oriented project, with complex documents with structures varying much from documents to documents ; its equivalent in Zope’s world would be Silva. But Plone is much more appropriate for everyday CMS sites. Its object-orientation offers both a great flexibility for the developer and more ease of use for Joe-six-pack webmaster. Plone still lacks some important technical features for its future, namely ReSTful web service interfaces, plus placeless content paradigm. Versioning is expected soon.

This article was written in just one raw, late at night and with no re-reading reviewed once thanks to Gouri. It may be wrong or badly lacking information on some points. So your comments are much welcome !