Project Objectives and Methods

The primary objective of the project is to develop and test novel methodologies for finding and matching open source software in order to allow an easier composition, thus giving means to ease the reuse of already existing code.

Reusing Open Source Code

Open source code (OSC) packages will be used to create a database of (loosely defined) components, or pieces of code that implement a specific functionality. The project does not target the strictly defined component [Mau00] market only, but it tries to find a methodology general enough so as to allow the optimal finding of software which matches some desired characteristics. Software is then considered as a white box piece, which will simplify the reuse of the enormous number of open source projects in existence, and increase significantly the number of elements which can be reused.
During the project, a methodology and a tool will be devised which can significantly reduce the work needed to identify, from a large number of components, those which can be used to implement a desired functionality. This will greatly help the reuse of existing code and the take up of open source software, by helping and guiding the search process and by making the usage of lesser-known packages easier. The project aims can be decomposed into the more precise following points:

Finding an ontology that targets source code software packages. These are understood not only as libraries, but also as well-defined, valuable routines and capabilities inside existing source code. This ontology will be used to describe in a portable and coherent way the properties of packages and, if present, of their subcomponents.
Creating a dictionary of terms and attributes to describe the software packages and their properties (designed to match the ontology).
Finding a compact, complete, and portable representation for the ontology.
Implementing a search and matching engine to allow the automatic generation of a list of software pieces to be assembled, starting from a high level description of the needs of the developer. This search engine will use heuristics to guide the search as directly as possible toward a solution and will take into account metrics to express the preferences of the developer. Developing and assessing these heuristics and metrics will also be performed as part of this project.
Implementing a suitable and persistent database, capable of holding several hundreds of code descriptions, and interfaced to the matching engine.
Filling in the database with a large enough number of descriptions to be able to test the matching engine with non-trivial problems.

Selecting Code

The selection of code to assemble from the database will be performed through a specialized matching engine, created using a Logic Programming environment (Ciao Prolog [HBC+00, HBC+99]) developed along the last years within the framework of several ESPRIT projects by one of the project partners (UPM), which is itself distributed under OSC license. The database of software components will be initially extracted (and later extended) from the internal archive developed by one of the partners (Conecta s.r.l.), currently holding descriptions of more than 14.000 open source and publicly available software packages, most of them unlisted in the commonly used software search sites like FreshMeat or SourceForge. Both the matching tool and the database will be open-sourced, in order to guarantee the best possible dissemination of the results. A live demonstration system using a selected database will be created within the project. It will only differ from more realistic cases in its size, and the database schema and query engine should therefore be immediately applicable to problems of any size. The ontology will be flexible enough so as to admit changes (specialization or abstraction of some of its parts) to tailor it to specific, local needs (e.g., to match tools and components created inside a single company, or a group of affiliated companies).

References

[BLHL01]T. Berners-Lee, J. Hender, and O. Lassila. The Semantic Web. Scientific American, May 2001. Available from http://www.sciam.com/2001/0501issue/0501berners-lee.html.
[dRG95] Maria del Rosario Girardi. Classification and Retrieval of Software through their Description in Natural Language. PhD thesis, Computer Science Department, University of Geneva, 1995.
[GI95] M. R. Girardi and B. Ibrahim. Using English to Retrieve Software. The Journal of Systems and Software, 30(3):249-270, September 1995.
[HBC+99] M. Hermenegildo, F. Bueno, D. Cabeza, M. Carro, M. García de la Banda, P. Lóopez- García, and G. Puebla. The CIAO Multi-Dialect Compiler and System: An Experimentation Workbench for Future (C)LP Systems. In Parallelism and Implementation of Logic and Constraint Logic Programming, pages 6585. Nova Science, Commack, NY, USA, April 1999.
[HBC+00] M. Hermenegildo, F. Bueno, D. Cabeza, M. Carro, M. Garc´ýa de la Banda, P. López-García, and G. Puebla. The Ciao Logic Programming Environment. In International Conference on Computational Logic, CL2000, July 2000.
[Mau00] P.M. Maurer. Components: What if they gave a revolution and nobody came? IEEE Computer, pages 28-34, June 2000.
[MMM95] H. Mili, F. Mili, and A. Mili. Reusing Software: Issues and research directions. IEEE Transactions on Software Engineering, 1995.
[SWCP] The Semantic Web Community Portal. Markup Languages and Ontologies. Available from http://www.semanticweb.org/knowmarkup.html.