Dr. Francisco Guzmán

Arabic Language Technologies
+(974) 4454 1224
I was excited to take part on the foundation of this new Institute. Helping to create something from scratch appealed to me. I already knew some of the talented people at QCRI, and could tell that something grand was brewing. Also, living in the intriguing Middle East, learning a new language, interacting with a new culture, that was a plus.

Research Focus at QCRI

Dr. Francisco Javier Guzmán Herrera is a Scientist working for the Arabic Language Technologies department at the Qatar Computing Research Institute (QCRI). His field of work is set at the intersection of Language Technologies and Machine Learning. He has extensive experience in the field of Machine Translation, where he  started working since 2006. His research has been published in top-tier venues such as the Association of Computational Linguistics (ACL), the international Conference on Empirical Methods for Natural Language Processing (EMNLP), International Conference on Computational Linguistics (COLING), among others. Together with his team, he has participated in several international machine translation competitions such as: Seventh Workshop on Statistical Machine Translation (WMT) 2012, International Workshop on Spoken Language Translation (IWSLT) 2013, National Institute of Standards and Technology (NIST) Open MT Evaluation 2015, consistently obtaining top rankings for Arabic-English and Spanish-English language pairs. In 2014, he obtained the “Best in show” award at the BBC’s NewsHack III – Language Tech event in London, U.K., where international teams competed to have the best language-technologies application for news.  His research has pushed the boundaries of the field of machine translation evaluation using discourse information. In 2014, his team metric (discotk) won the WMT2014 metrics evaluation campaign. 

Dr. Guzmán enjoys coaching young minds. In the summer of 2012, he and Dr. Stephan Vogel started the first “Hot Summer/Cool research” internship program, to introduce undergraduate students from local universities to the research world. Since then, he has mentored more than dozen students, challenging them with research questions related to language technologies.


Previous Experience

Before joining QCRI, he collaborated in teams dealing with Speech Technology (Tecnológico de Monterrey, 2011) and Machine Translation (Carnegie Mellon University, 2008-2011). He obtained his PhD from the Instituto Tecnológico y de Estudios Superiores de Monterrey, in Mexico; and was a visiting scholar at the Language Technologies Institute at Carnegie Mellon University from 2008-2009, where he  took part of DARPA’s GALE evaluation program, under the Rosetta (IBM) consortium..

Professional Experience

Professional Associations and Awards

  • Best in show award. 2014. We won the "Best in show" award for our Speech-to-Speech Translation demo at the BBC NewsHack III Hackathon in London, UK.
  • Best metric in competition. 2014. Our Machine Translation evaluation metric discoTK won the first place in the WMT2014 Metrics task.
  • Best system for Arabic English. 2013. Our Arabic-English and English-Arabic systems won the first place at IWSLT2013 translation tasks according to official metrics.
  • Best unconstrained system Spanish-English. 2012. Our Spanish-English system was the best performing unconstraind system according to human evaluation metrics at WMT 2012.
  • Best poster award. 2007. For CiCLing paper Using Translation Paraphrases from Trilingual Corpora to Improve Phrase-Based Statistical Machine Translation: A Preliminary Report with Leonardo Garrido.
  • EPF- Special Jury Mention. 2004.
  • The Washington Center for Internships and Seminars. NAFTA Leader's Program Scholarship. 2004
  • General Electric Foundation Scholarship. 2000 – 2004.


  • PhD in Information Technology and Communications, Tecnológico de Monterrey, Mexico. 2011.
  • Master's Degree in Engineering (DDI-Double Dipôme), EPF-Écoled’Ingénieurs, France. 2004.
  • Physics Engineering, Tecnológico de Monterrey, Mexico. 2004.

Selected Research

  • Learning to Differentiate Better from Worse Translations; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, Alessandro Moschitti, Preslav Nakov, and Massimo Nicosia. In  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 214-220, 2014.
  • Using Discourse Structure Improves Machine Translation Evaluation; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, and Preslav Nakov. In  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14), pages 687-698, 2014
  • The AMARA Corpus: Building Resources for Translating the Web's Educational Content ; Francisco Guzmán, Hassan Sajjad, Ahmed Abdelali, and Stephan Vogel. In  Proceedings of the 10th International Workshop on Spoken Language Translation (IWSLT'13), 2013.
  •  A Tale about PRO and Monsters ; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), pages 12-17, 2013.
  • Optimizing for Sentence-Level BLEU+1 Yields Short Translations; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 24rd International Conference on Computational Linguistics (COLING 2012), pages 1979–1994, 2012.
  • Word Alignment Revisited; Francisco Guzmán, Qin Gao, Jan Niehues, and Stephan Vogel. In Handbook of Natural Language Processing and Machine Translation: DARPA global autonomous language exploitation , pages 164-175. Joseph Olive, Caitlin Christianson, and John McCary (Eds). Springer Science & Business Media. 2011.
  • Reassessment of the role of phrase extraction in pbsmt; Francisco Guzmán, Qin Gao and Stephan Vogel. In  Machine Translation Summit XII, 2009.
  • Experiments in Spanish-English and German-English Machine Translation of News Text; Francisco Guzman, Preslav Nakov, Ahmed Thabet, Stephan Vogel ,QCRI at WMT12: , WMT, 2012.
  • Word Alignment Revisited; Francisco Guzman, Qin Gao, Jan Niehues, Stephan Vogel, in Handbook of Natural    Language Processing and Machine Translation, edited by Joseph Olive, 2011.
  • EMDC: A Semi-supervised Approach for Word Alignment; Qin Gao, Francisco Guzman, Stephan Vogel, pp. 349-357, in Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). August 2010.
  • Reassessment of the Role of Phrase Extraction in SMT; Francisco Guzman, Qin Gao, Stephan Vogel, MT Summit XII, Ottawa, Canada. August 2009.
  • Translation Paraphrases for Phrase-Based Statistical Machine Translation. Computational Linguistics and Intelligent Text Processing; Francisco Guzman and Leonardo Garrido, 9th International Conference, CICLING 2008. Haifa, Israel. February 2008.


Follow Us

  • YouTube
  • Twitter
  • Facebook
  • RSS Feed
  • Linkedin
  • github-web.png
Back to Top

In the Media

Forbes fake news pic.jpg

Can AI Put An End To Fake News? Don't Be So Sure


Fake news was the Collin’s word of the year for 2017 with good reason. In a year where politics-as-usual was torn apart at the seams, high-profile scandals rocked our faith in humanity and the ...

Read More


MIT/QCRI system uses machine learning to build road maps


Map apps may have changed our world, but they still haven’t mapped all of it yet. Specifically, mapping roads can be difficult and tedious: even after taking aerial images, companies still have to ...

Read More

Economist story pic.JPG

Improving disaster response efforts through data


Extreme weather events put the most vulnerable communities at high risk. How can data analytics strengthen early warning systems and and support relief efforts for communities in need? The size and ...

Read More

Upcoming Events

Past Events


MLDAS 2019 Promo Web.JPG

(MLDAS 2019) Machine Learning and Data Analytics Symposium

Download ICS File 01/04/2019  - 02/04/2019 ,

Machine Learning and Data Analytics Symposium - MLDAS 2019 Building on the success of the three previous events , Boeing and QCRI will hold the fifth Machine Learning and Data Analytics Symposium (...

Read More


"Learning to See" Public talk by Professor Antonio Torralba (MIT-CSAIL)

Download ICS File 25/03/2019 ,

Visit by Antonio Torralba, who teaches machines to automate tasks that a human visual system can accomplish, is part of annual spring research update between QCRI and MIT-CSAIL.

Read More


QCRI - MIT CSAIL 2019 Annual Project Review

Download ICS File 25/03/2019 ,

Executive Overview Sessions Open to publi Date: March 25, 2019 Time: 10:15AM - 5:15PM Venue: Hamad Bin Khalia Reseach Complex Multipurpose Room To view agenda, please click here . To RSVP to this ...

Read More

News Releases


QCRI and iMMAP announce Memorandum of Understanding


Pact aims to apply data analysis and artificial intelligence techniques to solve humanitarian problems.

Read More

UNDP workshop.JPG

UNDP partners with QCRI to use AI for social good


Qatar forum on leveraging AI to solve humanitarian problems fills to capacity.

Read More

C. Mohan pic.jpg

Renowned computing expert C. Mohan to bust blockchain myths in Qatar talk


Well-known inventor of database recovery algorithms to deliver keynote at QCRI's first blockchain workshop.

Read More