The primary objective of the project is to develop and test novel
methodologies for finding and matching open source software in order
to allow an easier composition, thus giving means to ease the reuse of
already existing code.
Reusing Open Source Code
Open source code (OSC) packages will be used to create a database of
(loosely defined)
components, or pieces of code that
implement a specific functionality. The project does not target the
strictly defined component [
Mau00] market only,
but it tries to find a methodology general enough so as to allow the
optimal finding of software which matches some desired
characteristics. Software is then considered as a
white box
piece, which will simplify the reuse of the enormous number of open
source projects in existence, and increase significantly the number of
elements which can be reused.
During the project, a methodology and a tool will be devised which can
significantly reduce the work needed to identify, from a large number
of components, those which can be used to implement a desired
functionality. This will greatly help the reuse of existing code and
the take up of open source software, by helping and guiding the search
process and by making the usage of lesser-known packages easier.
The project aims can be decomposed into the more precise following
points:
- Finding an ontology that targets source code software packages.
These are understood not only as libraries, but also as well-defined,
valuable routines and capabilities inside existing source code. This
ontology will be used to describe in a portable and coherent way the
properties of packages and, if present, of their subcomponents.
- Creating a dictionary of terms and attributes to describe the
software packages and their properties (designed to match the
ontology).
- Finding a compact, complete, and portable representation for the
ontology.
- Implementing a search and matching engine to allow the
automatic generation of a list of software pieces to be assembled,
starting from a high level description of the needs of the
developer. This search engine will use heuristics to guide the search
as directly as possible toward a solution and will take into account
metrics to express the preferences of the developer. Developing and
assessing these heuristics and metrics will also be performed as part
of this project.
- Implementing a suitable and persistent database, capable of
holding several hundreds of code descriptions, and interfaced to the
matching engine.
- Filling in the database with a large enough number of descriptions
to be able to test the matching engine with non-trivial
problems.
Selecting Code
The selection of code to assemble from the database will be performed
through a specialized matching engine, created using a Logic
Programming environment (Ciao Prolog [
HBC+00,
HBC+99]) developed along the last years within the
framework of several ESPRIT projects by one of the project partners
(UPM), which is itself distributed under OSC license. The database of
software components will be initially extracted (and later extended)
from the internal archive developed by one of the partners (Conecta
s.r.l.), currently holding descriptions of more than 14.000 open
source and publicly available software packages, most of them unlisted
in the commonly used software search sites like FreshMeat or
SourceForge. Both the matching tool and the database will be
open-sourced, in order to guarantee the best possible dissemination of
the results. A live demonstration system using a selected database
will be created within the project. It will only differ from more
realistic cases in its size, and the database schema and query engine
should therefore be immediately applicable to problems of any
size. The ontology will be flexible enough so as to admit changes
(specialization or abstraction of some of its parts) to tailor it to
specific, local needs (e.g., to match tools and components created
inside a single company, or a group of affiliated companies).
References
[BLHL01]T. Berners-Lee, J. Hender, and
O. Lassila. The Semantic Web. Scientific American, May 2001. Available
from http://www.sciam.com/2001/0501issue/0501berners-lee.html.
[dRG95] Maria del Rosario
Girardi. Classification and Retrieval of Software through their
Description in Natural Language. PhD thesis, Computer Science
Department, University of Geneva, 1995.
[GI95] M. R. Girardi and B. Ibrahim. Using
English to Retrieve Software. The Journal of Systems and Software,
30(3):249-270, September 1995.
[HBC+99] M. Hermenegildo, F. Bueno, D.
Cabeza, M. Carro, M. García de la Banda, P. Lóopez- García, and
G. Puebla. The CIAO Multi-Dialect Compiler and System: An
Experimentation Workbench for Future (C)LP Systems. In Parallelism and
Implementation of Logic and Constraint Logic Programming, pages
6585. Nova Science, Commack, NY, USA, April 1999.
[HBC+00] M. Hermenegildo, F. Bueno, D.
Cabeza, M. Carro, M. Garc´ýa de la Banda, P. López-García, and G.
Puebla. The Ciao Logic Programming Environment. In International
Conference on Computational Logic, CL2000, July 2000.
[Mau00] P.M. Maurer. Components: What if
they gave a revolution and nobody came? IEEE Computer, pages 28-34,
June 2000.
[MMM95] H. Mili, F. Mili, and A. Mili.
Reusing Software: Issues and research directions. IEEE Transactions on
Software Engineering, 1995.
[SWCP] The Semantic Web Community Portal.
Markup Languages and Ontologies. Available from http://www.semanticweb.org/knowmarkup.html.