CLARIN: Common Language Resources and Technologies Infrastructure

Start date: 01-01-2008
End date: 31-12-2010
Funded by: ICT (FP7) - Research Infrastructures
Project leader: Stelios Piperidis

CLARIN aims at the construction and operation of a shared distributed infrastructure that aims at making language resources (LRs) and relevant technology available to the humanities and social sciences research communities. The preparatory phase aims at bringing the project to the level of legal, organisational and financial maturity required for successful project implementation. To this end, an approach along the following dimensions is required in order to pave the way for implementation:

  • Funding and governance. The aim here is to bring together the funding agencies in the participating countries and to work out a ready-to-sign draft agreement about governance, financing, construction and operation of the infrastructure.
  • Technical specifications of the technologies involved (existing, emerging or off-the-shelf blueprints). A detailed specification of the infrastructure, agreement on data and interoperability standards will be provided, and a validated running prototype based on these specifications will be developed. The validation should cover all technical, linguistic, and user aspects.
  • Language specifications. The integrated prototype will have to be populated with a selection of LRs and technologies for all participating languages so that the validation of the specifications of the infrastructure and the proposed standards be made feasible. To this end, the adaptation and integration of existing resources to the CLARIN requirements and/or the creation of new ones, where necessary, will be performed. The objective is to deliver a sufficiently populated, and thoroughly tested prototype that demonstrates the adequacy of the approach for all participating languages, a prototype that can be used to bootstrap the construction phase.
  • User dimension. The objective is to ensure that the infrastructure will have been demonstrated to serve the humanities and social sciences users, and that we create a joint, informed community that is capable of exploiting and further developing the infrastructure. In order to fully exploit the potential of what language resources and technology has to offer a number of actions have to be undertaken: (i) an analysis of current practice in the use of language technology in the humanities will help to ensure that the specifications take into account the needs of the humanities; (ii) the execution of a number of typical humanities projects will help validating the prototype and its specifications; (iii) the humanities and social sciences communities have to be made aware of the potential of the use of language resources and technology (LRT) to improve or even innovate their research; (iv) the humanities and language technology communities have to be brought together in order to ensure lasting synergies between the communities.

