Welcome to GenSIE 2026
GenSIE (General-purpose Schema-guided Information Extraction) is a shared task at IberLEF 2026 focusing on the ability of systems to extract nested, structured information (JSON) from general-domain Spanish texts.
Read the full task description, including score metrics and detailed constraints.
The task challenges participants to use Small Language Models (SLMs) and inference-time techniques to handle Zero-Shot Schemas—where the extraction target is defined dynamically at runtime.
-
Zero-Shot Schema
Extract data using schemas seen only at inference time. No fixed ontology.
-
General Domain
From legal contracts to medical reports and news.
-
Inference-Time Focus
Focus on prompting, RAG, and constrained decoding. No massive fine-tuning.
-
Structured Output
Strict adherence to JSON Schema and complex semantic constraints.
Resources
Explore our technical documentation to get started:
- 🚀 Starter Kit: Baselines, local evaluation, and example data.
- 📂 Submission Guidelines: Official technical requirements for your entry.
- 📝 Submit Entry: Open an issue to register your team.
- 📊 Task Description: Detailed metrics and constraints.
Schedule
| Date | Event |
|---|---|
| ✅ March 06, 2026 | 🚀 Starter Kit Released (View Guide) |
| April 01, 2026 | 📂 Full Development Set (Remaining 170 examples) |
| May 08, 2026 | 🛑 Submission Deadline (Docker containers) |
| May 09, 2026 | 🔓 Test Set Release (For local error analysis) |
| May 09–30, 2026 | ⚙️ Evaluation Period (Hosted execution) |
| May 31, 2026 | 🏆 Results Announcement |
| June 07, 2026 | 📝 Paper Submission Deadline |
| Sept 22, 2026 | 🎤 IberLEF Workshop (León, Spain) |
News & Updates
- Jan 26, 2026: Website launched.
- March 01, 2026: We've had some delays with the preparation of the starter-kit which forced to push the date back to March 09 at the latest.
Motivation
The rise of Agentic Workflows has created a massive demand for systems that can communicate via structured protocols. To identify user intent, invoke external tools, or exchange information, an AI must output rigid, error-free structured data.
While massive proprietary models (like GPT-5) solve this through scale, GenSIE targets the innovation gap in Small Language Models (<14B). We aim to prove that with clever engineering (Chain-of-Thought, ReAct, Constrained Decoding), commodity hardware can perform complex structured extraction reliably.
Furthermore, we aim to prioritize efficiency and sustainability to ensure that high-performance extraction pipelines remain deployable in real-world scenarios. By focusing on models that run on consumer-grade hardware, we promote sustainable AI and cost-effective solutions that are accessible to smaller research groups and industry practitioners.
Organizing Committee
The GenSIE task is organized by a consortium between the Research Group on Artificial Intelligence and Data Science (GIA-UH) at the University of Havana and the Research Group in Natural Language Processing and Information Systems (GPLSI) at the University of Alicante.
This team brings together expertise in both Computer Science (Generative AI, Large Language Models) and Linguistics (Corpus Annotation, Semantic Evaluation).
Members
| Name | Affiliation | Role |
|---|---|---|
| Yudivian Almeida Cruz | University of Havana | PhD, Professor |
| Suilan Estévez Velarde | University of Havana | PhD, Professor |
| Alejandro Piad Morffis | University of Havana | PhD, Professor |
| Isabel Espinosa Zaragoza | University of Alicante | PhD, Assistant Professor |
| María Miró Maestre | University of Alicante | PhD, Postdoc Researcher |
| Lucía Sevilla Requena | University of Alicante | PhD Student, Assoc. Prof. |
| Alba Pérez Montero | University of Alicante | PhD Student |
| Ernesto Estevanell Valladares | University of Havana | PhD Student |
Contact
For questions regarding the task, dataset, or evaluation, please contact the corresponding author, Alejandro Piad Morffis.