Archives de l’auteur : Jean Millerat

Invitation à déjeuner

Cher lecteur, et si on déjeunait ensemble un de ces quatre ? Ceci est une invitation permanente, valable (à peu près) n’importe quel jour de la semaine, pour un déjeuner dans certains coins de Paris, du 92, du 78 ou du 91, mais à portée de mon lieu de travail (à voir par e-mail). Il suffit de me laisser un message en commentaire ici ou par mail à l’adresse sig chez sig point levillage point org. Ca me fera bien plaisir de te voir. A bientôt !

Oh, my job !

Ca y est, j’ai changé de job. Adieu la multinationale industrielle dans laquelle je manageais l’équipe intranet. Bonjour la multinationale high-tech dans laquelle je manage une équipe de recherche sur les « knowledge technologies ». Entre temps, quelques vacances bien méritées m’ont permis de me reposer et de travailler sur un sujet qui me tient à coeur : les innovations Internet d’utilité publique. Mais c’est une autre histoire dont je vous parlerai bientôt.

J’ai bien envie de vous dire quelques mots sur mon nouveau job. Mais je dois veiller à ne pas en dire trop, devoir de réserve et confidentialité obligent… comme avant mais en plus ancré dans la culture de mon nouvel employeur. Bon, bref, je suis dans une « tech’company » et je suis encore dans la phase de découverte de ce nouvel environnement. Et j’en suis encore à faire des « oh! » et des « ah! » de surprise chaque jour. Mis à part quelques rares mauvaises surprises (un firewall corporate un peu trop strict à mon goût, pas de proxy SOCKS, des procédures à n’en plus finir), je suis plutôt dans une phase d’ébahissement quotidien, ne serait-ce que lorsque je découvre mes nouveaux outils de travail. Jugez plutôt.

D’abord, ici, je suis dans un lab, je fais de la recherche : ah ! plaisir ! on va (enfin) pouvoir s’amuser un peu (plus) ! Et puis, le premier jour de mon arrivée, je créé mon blog sur l’intranet. Oh ! N’importe quel employé peut créer son blog sur l’intranet (avec Livelink) ! Et des wikis à volonté !? Mais où suis-je donc tombé ?! Tiens, mes voisins de bureau constituent l’équipe informatique locale. Oh-ah ! Je peux m’abonner aux news de cette équipe via leur flux RSS ! J’y apprends qu’un bon paquet des employés sont devenus adeptes de la messagerie instantanée… sur un serveur Jabber interne ! Et mes collègues, dans mon équipe, utilisent Firefox et Thunderbird (certains sous linux !) ! Ah ! Oh! Il y a un serveur NNTP dans l’entreprise avec des newsgroups internes ! Oh ! Mon N+3 parle de podcasting et de blogs dans sa dernière intervention devant un parterre d’analystes financiers ! Mais où suis-je donc tombé ? Quoi ? L’équipe informatique locale vient vanter les mérites de Python à mon équipe (entre deux distributions de M&Ms) ? Pincez-moi ! Le moteur de recherche de l’intranet me bombarde de contenu quand j’y cherche « P2P », « podcasting », « social software » et autres « semantic web »… 16 matches pour « blogosphere », pas mal ! Bon, reprenons nos esprits… Mmm… Mais… Mais… c’est un sourceforge corporate que je vois installé là ! Avec plus d’une centaine de projets (plutôt actifs) dedans ! Et des mailing lists actives et archivées sur le serveur corporate de mailing lists… Et le PDG annonce qu’il consacre le nouveau think tank intranet qui permet à n’importe quel employé de soumettre des propositions d’innovation dont on a ensuite un suivi via intranet… Ouhla la… Tiens, deux jours après mon arrivée, mon e-provisioning est presque terminé : je suis déjà dans la messagerie Groupe mais aussi dans l’annuaire LDAP groupe et dans le réplicat local. Wowa. Le choc entre « avant » et « après » est rudement sympa ! Je vais reprendre un de ces chocolats chauds que cette gentille machine à café nous distribue à volonté.

Au niveau environnement extérieur, le changement n’est pas mal non plus. Par la fenêtre, je ne vois plus la façade du gratte-ciel d’en face (La Défense…) mais les champs et les bois. Pour venir, je ne me tape plus quotidiennement deux heures et demi de RER + Bus mais une heure de voiture (tant que les autres banlieusards sont en vacances, je suis à 30 minutes de chez moi). Le trajet a un inconvénient : les agriculteurs du coin procèdent actuellement à de l’épendage de lisier, bonjour les odeurs. Mais, bon, c’est pittoresque. Et une fois les champs éloignés, je retrouve le parfum de la forêt du parc naturel que je traverse de part en part. Je longe chaque jour les remparts d’un château fort du 11è siècle, je traverse villages et hameaux, je me gare… et je badge.

Niveau ambiance, ça a l’air d’être bien sympathique ici. J’ai vite abandonné le costard-cravate de siège. Apparemment, dans les labs (mais aussi quand on est PDG ou VP), l’uniforme c’est plutôt le tee-shirt ou le polo auquel on rajoute une veste quand on veut faire plus classe. Mais où vais-je pouvoir mettre mon épingle à cravate et mes boutons de manchette (je plaisante, en fait, je n’ai ni épingle à cravate ni bouton de manchette).

Je vais arrêter là pour aujourd’hui et vais continuer à savourer mon modeste ébahissement quotidien.

Bilan de compétences et projets professionnels de mon équipe de recherche

[Ceci est le résumé de l’une de mes réalisations professionnelles. Je m’en sers pour faire ma pub dans l’espoir de séduire de futurs partenaires. Plus d’infos à ce sujet dans le récit de mon parcours professionnel.]

En 2005, je rejoins une équipe française de 6 chercheurs d’une multinationale américaine. Très spécialisée et isolée loin du siège, mon équipe regrette le manque de perspectives de mobilité interne. Je suis chargé d’étudier avec chacun ses projets d’évolution. Je propose et je mets en oeuvre une méthode de bilan individuel de compétences et d’élaboration de projet professionnel. Sur plus d’un an, je coache chacun pour élaborer sa synthèse de compétences professionnelles et son profil de personnalité ainsi qu’un cahier des charges précis de son prochain poste. J’obtiens le feedback de managers français et étrangers sur les projets professionnels de chaque collaborateur. Parallèlement, je me porte volontaire pour devenir tuteur d’un étudiant boursier qui, avec mon aide, réussit ses concours d’entrée aux Arts et Métiers. Un an après mon départ, l’ensemble du personnel de recherche Motorola France est cependant licencié. Plusieurs me remercient car ils n’étaient pas démunis pour affronter cette situation.

Publications, brevets et innovations en tant que chercheur aux Motorola Labs

En 2005, je rejoins les laboratoires de recherche appliquée de Motorola. Je prends la direction de l’équipe française en charge des systèmes de raisonnement et d’apprentissage automatiques pour la personnalisation des contenus et applications mobiles. En deux ans, je co-écris 1 livre technologique cofinancé par l’Union Européenne, 2 brevets et 3 publications académiques. En tant que représentant de Motorola au pôle de compétitivité Cap Digital, je rencontre les dirigeants de plusieurs jeunes entreprises innovantes parisiennes et, sur la base de ces partenariats possibles, je propose à ma hiérarchie 6 projets d’innovations. Je propose une dizaine de projets d’innovation pour notre incubateur interne « Early Stage Accelerator » et j’obtiens le feu vert et un coach pour démarrer l’incubation de 3 de ces projets dans les domaines de la publicité personnelle non invasive, des guides interactifs de programmes TV et de l’édition de contenus personnalisés pour téléphones. Malheureusement, suite aux mauvaises ventes de téléphones en Inde et en Chine, Motorola se restructure et ferme peu après tous ses centres de recherche en Europe.

Comparator

Comparator is a small Plone product I recently hacked for my pleasure. It’s called comparator until it gets a nicer name, if ever. I distribute it here under the GNU General Public License. It allows users to select any existing content type (object class) and to calculate a personnalized comparison of the instances of this class. For example, if you choose to compare « News Items », then you select the news items properties you want to base your comparison upon (title, creation date, description, …). You give marks to any value of these properties (somewhat a tedious process at the moment but much room for improvement in the future, there). Comparator then let’s you give relative weights to these properties so that the given marks are processed and the compared instances are ranked globally.

It’s a kind of basic block for building a comparison framework, for building Plone applications that compare stuff (any kind of stuff that exists within your portal, including semantically agregated stuff). Let’s say that your Plone portal is full of descriptions of beers (with many details about all kinds of beers). Then adding a comparator to your portal will let your users give weights to every beer property and rank all the beers according to their personal tastes.

Comparator is based on Archetypes and was built from an UML diagram with ArchgenXML. Comparator fits well in my vision of semantic agregation. I hope you can see how. Comments welcome !

Prospective sur la demande sociale de recherche

Demain, notre société devra faire face à des enjeux pour lesquels un effort de recherche scientifique et technologique sera nécessaire. Quels seront ces enjeux ? Quel sera l’effort de recherche nécessaire ? C’est à ces questions qu’essaie de répondre le programme « Agora 2020 » du centre de prospective et de veille scientifique du ministère de l’équipement.

Innovation industrielle, et Internet dans tout ça ?

Le rapport de Jean-Louis Beffa à Jacques Chirac a donné lieu à la création d’une agence pour l’innovation industrielle doté d’un joli budget. Ce rapport a été discuté dans la blogosphère.

Certains ont été notamment surpris de l’absence d’un axe prioritaire « Technologies de l’Information et de la Communication » dans ce rapport et ont souligné combien d’autres pays avaient au contraire misé sur l’innovation dans les TIC, les STIC (Sciences et …), les NTIC (Nouvelles… ce qui fait déjà un peu ancien).

Histoire d’apporter ma pierre à l’édifice de la critique (constructive), voici deux documents qui soulignent l’importance prioritaire que les TIC devraient avoir dans une politique d’innovation industrielle en France.

Le premier est un rapport d’étude du conseil stratégique des technologies de l’information auprès du premier ministre, portant sur les politiques de R&D sur les STIC dans les grands pays industriels. Il montre que l’Europe est largement en retard par rapport au Japon et aux USA en matière de R&D sur les NTIC.

Le deuxième document est le bulletin de juin 2005 du centre d’analyse statistique du Canada, portant sur l’innovation. Il indique :

Les résultats de l’Enquête sur l’innovation de 2003, qui portait sur l’innovation dans certaines industries de services, montrent que les établissements des industries de services des TIC sont les plus susceptibles d’être innovateurs. Au Canada, les trois industries où les taux d’innovation étaient les plus élevés appartenaient toutes aux TIC.

En l’occurence, il s’agit des éditeurs logiciels, des opérateurs satellite ou Internet et, dans une moindre mesure, des SSII et du conseil, des bureaux d’études, sociétés d’ingénierie ou de R&D et, enfin, des grossistes-distributeurs high-tech. Il me semble donc que l’agence française pour l’innovation industrielle néglige l’innovation dans l’industrie des services en ne prévoyant aucune priorité politique pour les TIC alors que, dans des pays tels que le Canada, les TIC sont perçues comme un secteur prioritaire d’innovation. On dit que les grands capitaines d’industrie, tels que M. Beffa, ne voient parfois dans l’informatique qu’un « mal nécessaire » (à la bonne gestion, notamment financière, des industries « lourdes »). Ceci explique-t-il cela ?

PS : Au passage, dans le document canadien, vous noterez que

les entreprises qui sont situées à proximité d’entreprises rivales ou d’universités ne sont pas plus
innovatrices que les autres de la même industrie, sauf quand la distance est extrêmement courte.

Il est précisé, un peu plus loin :

La proximité avec des entreprises rivales ou des universités semble favoriser l’innovation uniquement lorsque les distances sont très courtes (quelques centaines de mètres). Et même dans ces cas, la proximité n’a des répercussions que sur certains types d’innovations. La proximité étroite avec des entreprises rivales semble favoriser l’imitation plutôt que les innovations originales, tandis que la proximité étroite avec des universités semble favoriser les innovations originales plutôt que les imitations.

Alors, que penser de cet autre volet des politiques françaises de soutien à la R&D, qui passe par le développement de « pôles de compétitivité » censés rapprocher physiquement entreprises rivales et universités ? Le fond a certainement du bon. Mais a-t-on pensé à prescrire une distance limite au-delà desquels le pôle n’a plus de sens ni d’intérêt ?

Les Fondations en France

En France, les Fondations sont des structures qui allouent des moyens financiers à des « causes » diverses. Les Fondations sont des organisations sans but lucratif qui sont sensées être des championnes de « l’utilité publique », de l’ « intérêt général ». Voici quelques pointeurs qui vous permettront de mieux les connaître :

Panorama 2004 des Fondations d’Entreprise (signalé dans la liste de diffusion du centre français des fondations et également sur le blog Entreprise Citoyenne), un rapport d’étude par Ernst&Young (qui offre des services spécifiques aux OSBL)
les coordonnées et la description succincte des 466 (à ce jour) fondations qui se sont placées sous l’égide de la Fondation de France
Les fondations d’utilité publique de recherche sont un nouveau statut pour encourager la création de fondations dédiées à des activités de recherche scientifique, « à l’américaine » ?

Le bidouilleur mobile a toujours tor

Actuellement, je suis pas mal en déplacement. Aujourd’hui, j’arrive dans une salle informatique en libre-service d’une entreprise et j’essaie de me connecter au Net avec mon portable.

Première chose à faire, trouver de l’électricité car ma batterie est à plat. Pas de problème, je débranche un écran et lui pique sa prise. Je le remettrai en partant comme j’aurais aimé le trouver en arrivant, bien sûr. :)

Ensuite, il me faut du réseau. OK, je vérifie que les postes sont configurés en DHCP et je pique une prise RJ45 au poste devant lequel je suis. Ca marche, mon portable est accepté par le réseau local. L’administrateur réseau m’avait bien sûr donné son autorisation pour cette opération…

Maintenant, passons aux choses sérieuses : accéder au Net. Visiblement, les postes de cette salle ont des Internet Explorer qui sont configurés sans proxy. Effectivement, depuis mon portable, en désactivant le support proxy de mon Firefox, j’accède au Web sans encombre.

Oui, mais comment relever ma boîte aux lettres en POP ? Il y a un firewall qui montre les dents entre le Net et moi. Thunderbird m’insulte en me disant qu’il s’est fait jeter et que le cerbère de service refuse de le laisser sortir du réseau.

Heureusement, en bon bidouilleur mobile, j’ai mon routeur à l’oignon préféré, à savoir TOR. Je lance donc Tor sur mon portable. Celui-ci arrive à se faufiler par-dessus le firewall et m’ouvre un accès au monde extérieur. J’indique à Thunderbird d’utiliser Tor en tant que proxy SOCKS. Pas de problème, me voici sur le Net en POP !

Idem si je veux accéder à la TV ou la radio en ligne avec Winamp, je configure Winamp pour attaquer un proxy HTTP que j’installe sur mon portable (Privoxy). Et ce proxy relaie mes paquets de communication vers Tor en tant que proxy SOCKS.

Plus fort encore, votre firewall bloque les accès Web ? mais vous avez un proxy SOCKS d’entreprise qui peut vous ouvrir votre accès au Net ? Vous pouvez alors « socksifier tor » grâce à un petit utilitaire du style FreeCap et faire ainsi de belles acrobaties de bidouilleur mobile. Par contre, vous veillerez encore une fois, auparavant, à demander à votre administrateur réseau l’autorisation de faire de telles acrobaties. Il se pourrait qu’elles fassent hérisser les cheveux sur la tête d’un responsable local de la sécurité informatique trop zélé. Attention à ne pas vous mettre en TORT !

Bref, Tor, voila un joli joujou pour vous assurer une meilleure connexion au Net même dans des réseaux trop contraints. Bravo à l’EFF.

Daisy vs. Plone, feature fighting

A Gouri-friend of mine recently pointed me to Daisy, a « CMS wiki/structured/XML/faceted » stuff he said. I answered him it may be a nice product but not enough attractive for me at the moment to spend much time analyzing it. Nevertheless, as Gouri asked, let’s browse Daisy’s features and try to compare them with Plone equivalents (given that I never tried Daisy).

The Daisy project encompasses two major parts: a featureful document repository

Plone is based on an object-oriented repository (Zope’s ZODB) rather than a document oriented repository.

and a web-based, wiki-like frontend.

Plone has its own web-based fronted. Wiki features are provided with an additional product (Zwiki).

If you have different frontend needs than those covered by the standard Daisy frontend, you can still benefit hugely from building upon its repository part.

Plone’s frontend is easily customizable either with your own CSS, with inherting from existing ZPT skins or with a WYSIWYG skin module such as CPSSkin.

Daisy is a Java-based application

Plone is Python-based.

, and is based on the work of many valuable open source packages, without which Daisy would not have been possible. All third-party libraries or products we redistribute are unmodified (unforked) copies.

Same for Plone. Daisy seems to be based on Cocoon. Plone is based on Zope.

Some of the main features of the document repository are:
* Storage and retrieval of documents.

Documents are one of the numerous object classes available in Plone. The basic object in Plone is… an object that is not fully extensible by itself unless it was designed to be so. Plone content types are more user-oriented than generic documents (they implement specialized behaviours such as security rules, workflows, displays, …). They will be made very extensible when the next versions of the « Archetypes » underlying layer is released (they include through-the-web schema management feature that allow web users to extend what any existing content type is).

* Documents can consists of multiple content parts and fields, document types define what parts and fields a document should have.

Plone’s perspective is different because of its object orientation. Another Zope product called Silva is more similar to Daisy’s document orientation.

Fields can be of different data types (string, date, decimal, boolean, …) and can have a list of values to choose from.

Same for Archetypes based content types in Plone.

Parts can contain arbitrary binary data, but the document type can limit the allowed mime types. So a document (or more correctly a part of a document) could contain XML, an image, a PDF document, … Part upload and download is handled in a streaming manner, so the size of parts is only limitted by the available space on your filesystem (and for uploading, a configurable upload limit).

I imagine that Daisy allows the upload and download of documents having any structure, with no constraint. In Plone, you are constrained by the object model of your content types. As said above this model can be extended at run time (schema management) but at the moment, the usual way to do is to define your model at design time and then comply with it at run time. At run time (even without schema management), you can still add custom metadata or upload additional attached files if your content type supports attached files.

* Versioning of the content parts and fields. Each version can have a state of ‘published’ or ‘draft’. The most recent version which has the state published is the ‘live’ version, ie the version that is displayed by default (depends on the behaviour of the frontend application of course).

The default behaviour of Plone does not include real versioning but document workflows. It means that a given content can be in state ‘draft’ or ‘published’ and go from one state to another according to a pre-defined workflow (with security conditions, event triggering and so). But a given object has only one version by default.
But there are additional Plone product that make Plone support versioning. These products are to be merged into Plone future distribution because versioning has been a long awaited feature. Note that, at the moment, you can have several versions of a document to support multi-language sites (one version per language).

* Documents can be marked as ‘retired’, which makes them appear as deleted, they won’t show up unless explicitely requested. Documents can also be deleted permanently.

Plone’s workflow mechanism is much more advanced. A default workflow includes a similar retired state. But the admin can define new workflows and modify the default one, always referring to the user role. Plone’s security model is quite advanced and is the underlying layer of every Plone functionality.

* The repository doesn’t care much what kind of data is stored in its parts, but if it is « HTML-as-well-formed-XML », some additional features are provided:
o link-extraction is performed, which allows to search for referers of a document.
o a summary (first 300 characters) is extracted to display in search results
o (these features could potentially be supported for other formats also)

There is no such thing in Plone. Maybe in Silva ? Plone’s reference engine allows you to define associations between objects. These associations are indexed by Plone’s search engine (« catalog ») and can be searched.

* all documents are stored in one « big bag », there are no directories.

Physically, the ZODB repository can have many forms (RDBMS, …). The default ZODB repository is a single flat file that can get quite big : Data.fs

Each document is identified by a unique ID (an ever-increasing sequence number starting at 1), and has a name (which does not need to be unique).

Each object has an ID but it is not globally unique at the moment. It is unfortunately stored in a hierarchical structure (Zope’s tree). Some Zope/Plone developpers wished « Placeless content » to be implemented. But Daisy must still be superior to Plone in that field.

Hierarchical structure is provided by the frontend by the possibility to create hierarchical navigation trees.

Zope’s tree is the most important structure for objects in a Plone site. It is too much important. You can still create navigation trees with shortcuts. But in fact, the usual solution in order to have maximum flexibility in navigation trees is to use the « Topic » content type. Topics are folder-like object that contain a dynamic list of links to objects matching the Topic’s pre-defined query. Topic are like persistent searches displayed as folders. As a an example a Topic may display the list of all the « Photo »-typed objects that are in « draft » state in a specific part (tree branch) of the site, etc.

* Documents can be combined in so-called « collections ». Collections are sets of the documents. One document can belong to multiple collections, in other words, collections can overlap.

Topics too ? I regret that Plone does easily not offer a default way to display a whole set of objects in just one page. As an example, I would have enjoyed to display a « book » of all the contents in my Plone site as if it were just one single object (so that I can print it…) But there are some Plone additional products (extensions) that support similar functionalities. I often use « Content Panels » to build a page by defining its global layout (columns and lines) and by filling it with « views » from exisiting Plone objects (especially Topics). Content Panels mixed with Topics allow a high flexibility in your site. But this flexibility has some limits too.

* possibility to take exclusive locks on documents for a limitted or unlimitted time. Checking for concurrent modifications (optimistic locking) happens automatically.

See versioning above.

* documents are automatically full-text indexed (Jakarta Lucene based). Currently supports plain text, XML, PDF (through PDFBox), MS-Word, Excel and Powerpoint (through Jakarta POI), and OpenOffice Writer.

Same for Plone except that Plone’s search engine is not Lucene and I don’t know if Plone can read OpenOffice Writer documents. Note that you will require additional modules depending on your platform in order to read Microsoft files.

* repository data is stored in a relation database. Our main development happens on MySQL/InnoDB, but the provisions are there to add support for new databases, for example PostgreSQL support is now included.

Everything is in the ZODB. By default stored as a single file. But can also be stored in a relational database (but this is usually useless). You can also transparently mix several repositories in a same Plone instance. Furthermore, instead of having Plone directly writing in the ZODB’s file, you can configure Plone so that it goes through a ZEO client-server setup so that several Plone instances can share a common database (load balancing). Even better, there is a commercial product, ZRS, that allows you to transparently replicate ZODBs so that several Plone instances setup with ZEO can use several redundant ZODBs (no single point of failure).

The part content is stored in normal files on the file system (to offload the database). The usage of these familiar, open technologies, combined with the fact that the daisywiki frontend stores plain HTML, makes that your valuable content is easily accessible with minimal « vendor » lock-in.

Everything’s in the ZODB. This can be seen as a lock-in. But it is not really because 1/ the product is open source and you can script a full export with Python with minimal effort, 2/ there are default WebDAV + FTP services that can be combined with Plone’s Marshall extension (soon to be included in Plone’s default distribution) that allows you to output your content from your Plone site. Even better, you can also upload your structured semantic content with Marshall plus additional hacks as I mentioned somewhere else.

* a high-level, sql-like query language provides flexible querying without knowing the details of the underlying SQL database schema. The query language also allows to combine full-text (Lucene) and metadata (SQL) searches. Search results are filtered to only contain documents the user is allowed to access (see also access control). The content of parts (if HTML-as-well-formed-XML) can also be selected as part of a query, which is useful to retrieve eg the content of an « abstract » part of a set of documents.

No such thing in Plone as far as I know. You may have to Pythonize my friend… Except that Plone’s tree gives an URL to every object so that you can access any part of the site. But not with a granularity similar to Daisy’s supposed one. See silva for more document-orientation.

* Accesscontrol: instead of attaching an ACL to each individual document, there is a global ACL which allows to specify the access rules for sets of documents by selecting those documents based on expressions. This allows for example to define access control rules for all documents of a certain type, or for all documents in a certain collection.

Access control is based on Plone’s tree, with inheritance (similar to Windows security model in some way). I suppose Plone’s access control is more sophisticated and maintainable than Daisy’s one but it should require more investigation to explain why.

* The full functionality of the repository is available via an HTTP+XML protocol, thus providing language and platform independent access. The documentation of the HTTP interface includes examples on how the repository can be updated using command-line tools like wget and curl.

Unfortunately, Plone is not ReST enough at the moment. But there is some hope the situation will change with Zope 3 (Zope’s next major release that is coming soon). Note that Zope (so Plone) supports HTTP+XML/RPC as a generic web service protocol. But this is nothing near real ReSTful web services…

* A high-level, easy to use Java API, available both as an « in-JVM » implementation for embedded scenarios or services running in the daisy server VM, as well as an implementation that communicates transparently using the HTTP+XML protocol.

Say Python and XML/RPC here.

* For various repository events, such as document creation and update, events are broadcasted via JMS (currently we include OpenJMS). The content of the events are XML messages. Internally, this is used for updating the full-text index, notification-mail sending and clearing of remote caches. Logging all JMS events gives a full audit log of all updates that happened to the repository.

No such mechanism as far as I know. But Plone of course offers fully detailed audit logs of any of its events.

* Repository extensions can provide additional services, included are:
o a notification email sender (which also includes the management of the subscriptions), allowing subscribing to individual documents, collections of documents or all documents.

No such generic feature by default in Plone. You can add scripts to send notification in any workflow transition. But you need to write one or two lines of Python. And the management of subscriptions is not implemented by default. But folder-like object support RSS syndication so that you can agregate Plone’s new objects in your favorite news aggregator;

o a navigation tree management component and a publisher component, which plays hand-in-hand with our frontend (see further on)

I’ll see further on… :)

* A JMX console allows some monitoring and maintenance operations, such as optimization or rebuilding of the fulltext index, monitoring memory usage, document cache size, or database connection pool status.

You have several places to look at for this monitoring within Zope/Plone (no centralized monitoring). An additional Plone product helps in centralizing maintenance operations. Still some ground for progress here.

The « Daisywiki » frontend
The frontend is called the « Daisywiki » because, just like wikis, it provides a mixed browsing/editing environment with a low entry barrier. However, it also differs hugely from the original wikis, in that it uses wysiwyg editing, has a powerful navigation component, and inherits all the features of the underlying daisy repository such as different document types and powerful querying.

Well, then we can just say the same for Plone and rename its skins the Plonewiki frontend… Supports Wysiwyg editing too, with customizable navigation tree, etc.

* wysiwyg HTML editing
o supports recent Internet Explorer and Mozilla/Firefox (gecko) browsers, with fallback to a textarea on other browsers. The editor is customized version of HTMLArea (through plugins, not a fork).

Same for Plone (except it is not an extension of HTMLArea but of a similar product).

o We don’t allow for arbitrary HTML, but limit it to a small, structural subset of HTML, so that it’s future-safe, output medium independent, secure and easily transformable. It is possible to have special paragraph types such as ‘note’ or ‘warning’. The stored HTML is always well-formed XML, and nicely layed-out. Thanks to a powerful (server-side) cleanup engine, the stored HTML is exactly the same whether edited with IE or Mozilla, allowing to do source-based diffs.

No such validity control within Plone. In fact, the structure of a Plone document is always valid because it is managed by Plone according to a specific object model. But a given object may contain an HTML part (a document’s body as an example) that may not be valid. If your documents are to have a recurrent inner structure, then you are invited to make this structure an extension of an object class so that is no more handled as a document structure. See what I mean ?

o insertion of images by browsing the repository or upload of new images (images are also stored as documents in the repository, so can also be versioned, have metadata, access control, etc)

Same with Plone except for versioning. Note that Plone’s Photo content type support automatic server-side redimensioning of images.

o easy insertion document links by searching for a document

Sometimes yes, sometimes no. It depends on the type of link you are creating.

o a heartbeat keeps the session alive while editing

I don’t know how it works here.

o an exlusive lock is automatically taken on the document, with an expire time of 15 minutes, and the lock is automatically refreshed by the heartbeat

I never tried the Plone extension for versioning so I can’t say. I know that you can use the WebDAV interface to edit a Plone object with your favorite text processing package if you want. And I suppose this interface properly manages this kind of issues. But I never tried.

o editing screens are built dynamically for the document type of the document being edited.

Of course.

* Version overview page, from which the state of versions can be changed (between published and draft), and diffs can be requested. * Nice version diffs, including highlighting of actual changes in changed lines (ignoring re-wrapping).

You can easily move any object in its associated workflow (from one state to another, through transitions). But no versioning. Note that you can use Plone’s wiki extension and this extension supports supports diffs and some versioning features. But this is not available for any Plone content type.

* Support for includes, i.e. the inclusion of one document in the other (includes are handled recursively).

No.

* Support for embedding queries in pages.

You can use Topics (persistent queries). You can embed them in Content Panels.

* A hierarchical navigation tree manager. As many navigation trees as you want can be created.

One and only one navigation tree by default. But Topics can be nested. So you can have one main navigation tree plus one or more alternatives with Topics (but these alternatives are limited for some reasons.).

Navigation trees are defined as XML and stored in the repository as documents, thus access control (for authoring them, read access is public), versioning etc applies. One navigation tree can import another one. The nodes in the navigation tree can be listed explicitely, but also dynamically inserted using queries. When a navigation tree is generated, the nodes are filtered according to the access control rules for the requesting user. Navigation trees can be requested in « full » or « contextualized », this last one meaning that only the nodes going to a certain document are expanded. The navigtion tree manager produces XML, the visual rendering is up to XSL stylesheets.

This is nice. Plone can not do that easily. But what Plone can do is still done with respect to its security model and access control, of course.

* A navigation tree editor widget allows easy editing of the navigation trees without knowledge of XML. The navigation tree editor works entirely client-side (Mozilla/Firefox and Internet Explorer), without annoying server-side roundtrips to move nodes around, and full undo support.

Yummy.

* Powerful document-publishing engine, supporting:
o processing of includes (works recursive, with detection of recursive includes)
o processing of embedded queries
o document type specific styling (XSLT-based), also works nicely combined with includes, i.e. each included document will be styled with its own stylesheet depending on its document type.

* PDF publishing (using Apache FOP), with all the same features as the HTML publishing, thus also document type specific styling.

Plone document-like content type offer PDF views too.

* search pages:
o fulltext search
o searching using Daisy’s query language
o display of referers (« incoming links »)

Fulltext search is available. No query language for the user. Display of refers is only available for content type that are either wiki pages or have been given the ability to include references from other objects.

* Multiple-site support, allows to have multiple perspectives on top of the same daisy repository. Each site can have a different navigation tree, and is associated with a default collection. Newly created documents are automatically added to this default collection, and searches are limited to this default collection (unless requested otherwise).

It might be possible with Plone but I am not sure when this would be useful.

* XSLT-based skinning, with resuable ‘common’ stylesheets (in most cases you’ll only need to adjust one ‘layout’ xslt, unless you want to customise heavily). Skins are configurable on a per-site basis.

Plone’s skins are using the Zope Page Templates technology. This is a very nice and simple HTML templating technology. Plone’s skins make an extensive use of CSS and in fact most of the layout and look-and-feel of a site is now in CSS objects. These skins are managed as objects, with inheritance, overriding of skins and other sophisticated mechanism to configure them.

* User self-registration (with the possibility to configure which roles are assigned to users after self-registration) and password reminder.

Same is available from Plone.

* Comments can be added to documents.

Available too.

* Internationalization: the whole front-end is localizable through resource bundles.

Idem.

* Management pages for managing:
o the repository schema (the document types)
o the users
o the collections
o access control

Idem.

* The frontend currently doesn’t perform any caching, all pages are published dynamically, since this also depends on the access rights of the current user. For publishing of high-trafic, public (ie all public access as the same user), read-only sites, it is probably best to develop a custom publishing application.

Zope includes caching mechanisms that take care of access rights. For very high-trafic public sites, a Squid frontend is usually recommended.

* Built on top of Apache Cocoon (an XML-oriented web publishing and application framework), using Cocoon Forms, Apples (for stateful flow scenarios), and the repository client API.

By default, Zope uses its own embedded web server. But the usual setup for production-grade sites is to put an Apache reverse-proxy in front of it.

My conclusion : Daisy looks like a nice product when you have a very document-oriented project, with complex documents with structures varying much from documents to documents ; its equivalent in Zope’s world would be Silva. But Plone is much more appropriate for everyday CMS sites. Its object-orientation offers both a great flexibility for the developer and more ease of use for Joe-six-pack webmaster. Plone still lacks some important technical features for its future, namely ReSTful web service interfaces, plus placeless content paradigm. Versioning is expected soon.

This article was written in just one raw, late at night and with no re-reading reviewed once thanks to Gouri. It may be wrong or badly lacking information on some points. So your comments are much welcome !

Semantic Web reports for corporate social responsability

With that amount of buzzwords in the title, I must be ringing some warning bells in your minds. You would be right to get cautious with what I am going to say here because this is pure speculation. I would like to imagine how annual (quarterly ?) corporate reports should look like in some near future.

In my opinion, they should carry on the current trend on emphasizing corporate social responsability. In order to do so, they should both embrace innovative reporting standards and methodologies and support these methodologies by implementing them with « semantic web »-like technologies. In such a future, it would mean that financial analyst (and eventually stakeholders) should be able to browse through specialized web sites which would aggregate meaningful data published in these corporate reports. In such specialized web sites, investors should be able to compare comparable data, marks and ratings regarding their favorite corporations. They should be given functionalities like the one you find in multidimensional analysis tools (business intelligence), even if they are as simplified as in interactive purchase guides [via Fred]. In such a future, I would be able to subscribe to such a web service, give my preferences and filters in financial, social and environmental terms. This service would give me a snapshot of how the selected corporations compare one to each other regarding my preferences and filters. Moreover, I would receive as an RSS feed an alert whenever a new report is published or when some thresholds in performance are reached by the corporations I monitor.

Some technological issues still stand in the way of such future. They are fading away. But a huge amount of methodological and political issues stand there also… What if such technologies come to maturity ? Would they push corporations, rating agencies, analysts and stakeholders to change their minds and go in the right direction ?

C’est le début de la fin.

Gouri, en prophète de l’apocalypse digne de Tintin et l’Etoile mystérieuse, vous l’explique avec force diagramme et statistique : c’est aujourd’hui, 25 mai 2005, le grand boom, celui du début de la fin du monde. Vous êtes encore là ?

Projet Internet de rue: Appel à nos amis roumains !

Y a-t-il des roumains de Roumanie dans la salle ? En France, il y en a dans la rue, qui n’ont pas forcément beaucoup de moyens mais
ont l’audace de goûter aux nouvelles technos. [Je précise après coup qu’il s’agit de roumains gitans car un lecteur roumain s’est senti offensé de l’absence de cette précision, cf. la discussion plus bas.] Ils aimeraient bien communiquer par Internet avec leur famille restée au pays, pour échanger quelques photos du fils que l’on a pas revu depuis deux ans… Problème : au pays, qui pourra donner (prêter ?) à cette famille un accès à l’Internet ? Faites passer cet appel à vos contacts en Roumanie, ça pourrait être sympa. Au passage, découvrez le formidable projet Internet de rue.

Une cible dans la poche

Les prochaines versions des passeports américains pourraient contenir une puce RFID. Cette puce pourrait être lue à distance par une personne malveillante (un terroriste ?) qui disposerait ainsi d’un très bon détecteur d’américain.Forcément, une telle perspective a de quoi inquiéter !

Consultant-in-a-box

Je travaille actuellement avec des consultants d’une très grosse SSII indienne en vue de l’externalisation offshore d’un projet informatique. Le consultant avec qui je travaille sur site pratique le yoga depuis de nombreuses années. Il ne sait pas encore entrer dans une boîte, m’a-t-il avoué. Mais le concept d’un « consultant-in-a-box » ou encore d’un « Commercial-Off-The-Shelf » Consultant a de quoi séduire ! Imaginez un peu, vous allez sur votre site de e-business favori, du style consultantinabox.com et là, vous cochez les compétences du consultant indien dont vous avez besoin : un peu de J2EE par ici, un peu d’intégration de middleware de sécurité par ici, etc. Vous validez votre devis en ligne et payez avec votre carte de crédit corporate. Ensuite, UPS vous livre en 24H, par avion, une boîte avec votre consultant yogi dedans. Pas mal, non ? Bien sûr, en fin de mission, le consultant sait se repackager tout seul et retourne à Bangalore pour poursuivre la mission à distance (en mode offshore, ça coûte bien moins cher que sur place).

Trève de plaisanterie, voici quelques notes en vrac glanées lors d’une discussion avec la société Pivolis, spécialisée en accompagnement de sociétés européennes souhaitant travailler avec des indiens (cf. également cet ancien article citant Mr Pivolis, aka Vincent Massol).

Les sociétés d’accompagnement du type Pivolis peuvent assurer un rôle :

logistique : infrastructures (VPN, …), voyages et visites, arrivées et départ (gestion des identités, provisioning des nouveaux entrants dans le projet)
de médiateur/facilitateur/organisateur des communications entre français et indiens
de conseil méthodologique (et outils)

Typiquement, pour un projet représentant 100 équivalent plein temps (50 français et 50 indiens), la charge de coordination par ce type de société pourrait représenter 2 équivalents temps plein (soit 2%). De plus, dans l’équipe française, certaines personnes sont également dédiées à des tâches de support à l’équipe indienne, i.e. à la facilitation du transfert de connaissances. Dans une première phase de projet, cela peut représenter une personne pour 6 indiens. Ensuite, les choses deviennent plus routinières et le ratio se stabilise à un français en support à l’offshore pour 10 indien.

Contrairement à ce à quoi on pourrait s’attendre, il ne faut pas chercher outre mesure à spécialiser les équipes françaises et indiennes. Mieux vaut organiser et entretenir une certaine redondance des rôles sur site et à l’offshore, au moins pendant une (longue) période de transfert de compétences. La communication s’établit alors à de multiples niveaux : de project leader français à project leader indien, de lead developer français à lead developer indien. Cette similarité des rôles et la multiplicité des canaux de communication qui s’en suit sont des facteurs de clefs de succès pour la coopération franco-indienne sur un projet informatique. Le facteur clef de succès, selon Pivolis, c’est de s’efforcer de garder les gens heureux de part et d’autre. Le second principal facteur de succès réside dans la rigueur et la discipline des équipes notamment quant à l’utilisation des outils de coordination (principalement un issue-tracker, un outil de gestion de builds et un wiki).

De multiples outils sont souhaitables pour faire collaborer des équipes distantes. On peut les analyser selon leur fréquence d’utilisation et leur efficacité relative. Concernant leur fréquence, on trouvera par exemple, pour un projet de 100 personnes :

Voyages (de la France vers l’Inde ET vice-versa) : 3 personnes toutes les 6 semaines
Echange de documents bureautiques : rédaction d’un document important par mois
Téléphone : 1,5 conférence téléphonique par semaine et par équipe de 10 personnes
Wiki : 1 modification par jour et par équipe de 10 personnes
Emails : plusieurs par jour et par personne
Instant messaging : continu et omniprésent

On pourrait grosso modo grouper tous ces outils en 4 catégories :

fréquemment utilisés et très adaptés à ces modes de collaboration : issue tracker, mailing list, wiki
fréquemment utilisés mais moins riches en efficacité : mails individuels, instant messaging, conférence téléphonique
plus rarement utilisés mais très riches et efficaces : documentation, voyages sur site
rarement utilisés et relativement inefficaces : video-conférences

Trois différences culturelles entre indiens et français, à prendre en compte pour faciliter la compréhension mutuelle :

les indiens ont un respect très profond de leur interlocuteur, attitude respectueuse qui n’est pas forcément compatible avec la propension française à s’engager dans des combats de coqs, à tenter une révolution française à chaque réunion ; les français peuvent avoir l’impression de travailler avec des personnes manquant de répondant ou de franchise alors que les indiens peuvent avoir l’impression de travailler avec des personnes impolies, négatives ou agressives
les indiens ont une forte culture de l’hospitalité, de l’accueil et des habitudes culinaires particulières (végétariens…) ; les français sont-ils prêts à accueillir leurs interlocuteurs indiens dans leur famille pendant le week-end ou à passer leur week-end complet à leur faire visiter le château de Versailles ? comment manger un vrai repas végétarien dans un resto Sodexho ?
les indiens appréciraient les félicitations et les remerciements formels, les signes de gratitude (ou de bienvenue, tels que des cadeaux) ; les français peuvent trouver cela superflu ou « chi-chi », et passer des gens indifférents voire ingrats

Ces dernières semaines de travail avec des consultants indiens m’ont déjà permis de vérifier quelques-uns de ces conseils ou observations confiés par la société Pivolis. Et je sens qu’il serait plus que judicieux, pour aller plus loin, d’avoir recours aux services de ce type de société pour un accompagnement sérieux sur le long terme.

Unique identifiers get Febrl

When dealing with identity management, one really appreciate having unique identifiers for describing individual identities. Having a common unique identifier available accross a whole information system is a terrific asset for the management of IT security. The problem is that information systems usually don’t have such a global naming convention or these naming conventions are too weak to ensure uniqueness and permanence of these identities. The usual solution is to define a more clever naming convention, to invent a new unique identifier and to associate any individual data with it so that people identities get managed.

But then the deployment of this unique identifiers raises a new problem : how to guarantee that a given data record describing a person is really related to that person you already know with a unique identifier. You have to decide matches and non-matches accross your data.
The art of doing such decisions was called « record linkage » by the biomedical community because this is a common issue in health information systems for example for epidemiological studies.

Therefore this community developped several approaches (deterministic or probabilistic) to record linkage that can also be applied to field of identity management. Febrl is a very nice open source package that implement state-of-the-art methods for record linkage and that may be applied to the deployment of unique identifiers in IT security systems.

From OWL to Plone

I found a working path to transform an OWL ontology into a working Plone content-type. Here is my recipe :

Choose any existing OWL ontology
With Protege equipped with its OWL plugin, create a new project from your OWL file.
Still within Protege, with the help of its UML plugin, convert your OWL-Protege project into a UML classes project. You get an XMI file.
Load this XMI file into an UML project with Poseidon. Save this project under the .zuml Poseidon format.
From poseidon, export your classes a new xmi file. It will be Plone-friendly.
With a text editor, delete some accentuated characters that Poseidon might have added to your file (for example, the Frenchy Poseidon adds a badly accentuated « Modele sans titre » attribute into your XMI) because the next step won’t appreciate them
python Archgenxml.py -o YourProduct yourprojectfile.xmi turns your XMI file into a valid Plone product. Requires Plone and Archetypes (see doc) latest stable version plus ArchgenXML head from the subversion repository.
Launch your Plone instance and install YourProduct as a new product from your Plone control panel. Enjoy YourProduct !
eventually populate it with an appropriate marshaller.

Now you are not far from using Plone as a semantic aggregator.

Social software for skyscrapers

(Via Designing for Civil Society). iSociety was exploring the idea of using social software in local contexts, specifically in a local residential area (a set of skyscrapers). They see the potential of social software in its ability to

facilitate better face-to-face [communication] : create introductions between people who recognise their shared interests and want to meet

circumvent face-to-face [communication] : enable weak norms of cooperation between people who don’t want to meet, or can’t, but still have shared interests (which they may not even be aware of)

I would call this last case « loosely coupled communication » in the same way the blogosphere enables distributed conversation.

They identified three fields of use for local social software :

infrastructure : transforming your local facility manager into a blogger so that residents get involved in managing shared facilities (elevators, shared areas, …)
tasks : facilitating the scheduling of activities such as sport, local trade or childcare with an online reputation system and group forming features
culture: for people interested in linkage with neighbours for it’s own sake

They think the higher potential is in the « task » field because

studies show that activities such as these which require cooperation have a better impact on social capital than projects such as community centres, which promote cooperation.

In other words, as they say :

Social capital is best pursued obliquely

Their conclusion that local residential areas may not need generic social software but task-oriented social software.

This reminds me of a community project I ran when I was younger : the volunteer team I was part of wanted to socialize with some youngsters who lived in nearby slums because we were curious about how it was to live in such poor districts. The best way we found to get into this distant social context was to first identify a very concrete project that would require us to meet these other teenagers. We heard a local association in such a slum was training volunteers in improvised acting. My team was poor on acting but we were strong in video technical skills. So we had in hand a reason to go to this association and ask for help to complete our task/project : making a short video fiction with other young volunteers. We made this short movie together (it took one year of work during our week-ends) and it was a lot of fun ! Moreover, this project was a success in building local social capital because it was task-oriented and its success required strong cooperation.

Identification and naming practices

States have been precursors in building registers of persons. Here are some national practices and projects in civil registration systems, vital statistics and other administrative identity systems :

You are building an international directory of persons, you know that you will record names, surnames, given names. But what does « surname » mean ? Will you be understood when you ask a foreigner his given name ? Here come culture, society and naming practices :

The Wikipedia on family names
Naming in the Kashmiri Pandit Community : Gotras, surnames and nicknames
Latin American Surnames
Norwegian Naming Practices
Vietnamese names
Arabic Naming Practices And Period Names List
Chinese personal names
Campaigns of Opposition to ID Card Schemes, How the UK population considers civil registration (see question « what contribution should civil registration make to proving identity and how ? »
Traduction officielles des noms des éléments constituant l’identité d’une personne dans différents pays européens : prénoms/vornamen/nombre propio/nome/onoma/…, nom/name/appelidos/geslachtsnaam/…

You’ve got existing databases that you want to link to your fresh new directory of persons. But how to build that link ? How to match records coming from different databases when they don’t share a common unique identifier ? This is the art of record linkage :

Google for « retrospective linkage », « retrospective linkages »
Record linkage in an Information Age society, Linking Health Records : Human rights concerns, Record linkage techniques
High performance computing techniques for record linkage, see also their website and febrl, their open source prototype software (powered by Python !)
Research papers by an expert from the U.S. Bureau of the Census : Matching and record linkage, Frequency-based matching in fellegi-sunter, record linkage software and methods for merging administrative lists

Transliteration

I wanted to give Alban and others pointers to my resources on the topic of transliteration. But I can’t find my transliteration documents any more ! Anyway, my experience is that transliteration is a tough problem and after having thought a little bit on this topic, we decided not to automate the transliteration of individual names but to make people input their name according to their own habits in a more or less transliterated form. It would have been great if you were able to automate transliteration (maybe with the help of a virtual Unicode keyboard ?). The main advantage of standardized transliteration is that it is supposed to give you a standardized representation of the name of a person. You might then rely on this standardized naming elements in order to build a unique identifier. But the problem is that many language transliterations are not standardized, plus these standards evolve too much. A greek colleague of mine told me that his name was transliterated many times with many different output (creating problems at the airport, I let you imagine). Transliteration definitely remains as a problem for strong identity management. At the moment, you should just try to workaround it until transliteration standards are more robust and widely adopted…

Anyway, here are some pointers on how to process non-latin documents and maybe transliterate them :

Jean Millerat's bytes for good

Innover, Servir, Entreprendre !