The programme is now available. We are happy to feature invited talks by:
Benchmark libraries and competitions are two popular approaches to comparative empirical evaluation of reasoning systems. It has become accepted wisdom that regular comparative evaluation of reasoning systems helps focus research, identify relevant problems, bolster development, and advance the field in general. The number of competitions has been rapidly increasing lately. At the moment, we are aware of about a dozen benchmark collections and two dozen competitions for reasoning systems of different kinds. We feel that it is time to compare notes.
What are the proper empirical approaches and criteria for effective comparative evaluation of reasoning systems? What are the appropriate hardware and software environments? How to assess usability of reasoning systems? How to design, acquire, structure, publish, and use benchmarks and problem collections?
The issue has received a new dimension recently, as systems that have a strong interactive aspect have entered the arena. Three competitions dedicated to deductive software verification within the last two years show a dire need of comparative empirical evaluation in this area. At the same time, it remains largely unclear how to systematically evaluate performance and usability of such systems and how to separate technical aspects and the human factor.
The workshop aims to advance comparative empirical evaluation by bringing together current and future competition organizers and participants, maintainers of benchmark collections, as well as practitioners and the general scientific public interested in the topic.
Furthermore, the workshop intends to reach out to researchers specializing in empirical studies in computer science outside of automated reasoning. We expect such connections to be fruitful but unlikely to come about on their own.
The scope of the workshop includes (but is not limited to) topics such as the following. All topics apply to comparative evaluation of reasoning systems. Reports on evaluating a single system (such as case studies done with a particular system) are not in scope of the workshop.
Papers can be submitted either as regular papers (6-15 pages in LNCS style) or as discussion papers (2-4 pages). Regular papers should present previously unpublished work (completed or in progress), including proposals for system evaluation, descriptions of benchmarks, and experience reports. Discussion papers are intended to initiate discussions, should address controversial issues, and may include provocative statements.
All submitted papers will be refereed by the program committee and will be selected in accordance with the referee reports. The collection of accepted papers will be distributed at the workshop and will also be published in the CEUR Workshop Proceedings series.
Submission upload is now closed.