Dr. Francisco Guzmán

Arabic Language Technologies
+(974) 4454 1224
I was excited to take part on the foundation of this new Institute. Helping to create something from scratch appealed to me. I already knew some of the talented people at QCRI, and could tell that something grand was brewing. Also, living in the intriguing Middle East, learning a new language, interacting with a new culture, that was a plus.

Research Focus at QCRI

Dr. Francisco Javier Guzmán Herrera is a Scientist working for the Arabic Language Technologies department at the Qatar Computing Research Institute (QCRI). His field of work is set at the intersection of Language Technologies and Machine Learning. He has extensive experience in the field of Machine Translation, where he  started working since 2006. His research has been published in top-tier venues such as the Association of Computational Linguistics (ACL), the international Conference on Empirical Methods for Natural Language Processing (EMNLP), International Conference on Computational Linguistics (COLING), among others. Together with his team, he has participated in several international machine translation competitions such as: Seventh Workshop on Statistical Machine Translation (WMT) 2012, International Workshop on Spoken Language Translation (IWSLT) 2013, National Institute of Standards and Technology (NIST) Open MT Evaluation 2015, consistently obtaining top rankings for Arabic-English and Spanish-English language pairs. In 2014, he obtained the “Best in show” award at the BBC’s NewsHack III – Language Tech event in London, U.K., where international teams competed to have the best language-technologies application for news.  His research has pushed the boundaries of the field of machine translation evaluation using discourse information. In 2014, his team metric (discotk) won the WMT2014 metrics evaluation campaign. 

Dr. Guzmán enjoys coaching young minds. In the summer of 2012, he and Dr. Stephan Vogel started the first “Hot Summer/Cool research” internship program, to introduce undergraduate students from local universities to the research world. Since then, he has mentored more than dozen students, challenging them with research questions related to language technologies.


Previous Experience

Before joining QCRI, he collaborated in teams dealing with Speech Technology (Tecnológico de Monterrey, 2011) and Machine Translation (Carnegie Mellon University, 2008-2011). He obtained his PhD from the Instituto Tecnológico y de Estudios Superiores de Monterrey, in Mexico; and was a visiting scholar at the Language Technologies Institute at Carnegie Mellon University from 2008-2009, where he  took part of DARPA’s GALE evaluation program, under the Rosetta (IBM) consortium..

Professional Experience

Professional Associations and Awards

  • Best in show award. 2014. We won the "Best in show" award for our Speech-to-Speech Translation demo at the BBC NewsHack III Hackathon in London, UK.
  • Best metric in competition. 2014. Our Machine Translation evaluation metric discoTK won the first place in the WMT2014 Metrics task.
  • Best system for Arabic English. 2013. Our Arabic-English and English-Arabic systems won the first place at IWSLT2013 translation tasks according to official metrics.
  • Best unconstrained system Spanish-English. 2012. Our Spanish-English system was the best performing unconstraind system according to human evaluation metrics at WMT 2012.
  • Best poster award. 2007. For CiCLing paper Using Translation Paraphrases from Trilingual Corpora to Improve Phrase-Based Statistical Machine Translation: A Preliminary Report with Leonardo Garrido.
  • EPF- Special Jury Mention. 2004.
  • The Washington Center for Internships and Seminars. NAFTA Leader's Program Scholarship. 2004
  • General Electric Foundation Scholarship. 2000 – 2004.


  • PhD in Information Technology and Communications, Tecnológico de Monterrey, Mexico. 2011.
  • Master's Degree in Engineering (DDI-Double Dipôme), EPF-Écoled’Ingénieurs, France. 2004.
  • Physics Engineering, Tecnológico de Monterrey, Mexico. 2004.

Selected Research

  • Learning to Differentiate Better from Worse Translations; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, Alessandro Moschitti, Preslav Nakov, and Massimo Nicosia. In  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 214-220, 2014.
  • Using Discourse Structure Improves Machine Translation Evaluation; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, and Preslav Nakov. In  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14), pages 687-698, 2014
  • The AMARA Corpus: Building Resources for Translating the Web's Educational Content ; Francisco Guzmán, Hassan Sajjad, Ahmed Abdelali, and Stephan Vogel. In  Proceedings of the 10th International Workshop on Spoken Language Translation (IWSLT'13), 2013.
  •  A Tale about PRO and Monsters ; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), pages 12-17, 2013.
  • Optimizing for Sentence-Level BLEU+1 Yields Short Translations; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 24rd International Conference on Computational Linguistics (COLING 2012), pages 1979–1994, 2012.
  • Word Alignment Revisited; Francisco Guzmán, Qin Gao, Jan Niehues, and Stephan Vogel. In Handbook of Natural Language Processing and Machine Translation: DARPA global autonomous language exploitation , pages 164-175. Joseph Olive, Caitlin Christianson, and John McCary (Eds). Springer Science & Business Media. 2011.
  • Reassessment of the role of phrase extraction in pbsmt; Francisco Guzmán, Qin Gao and Stephan Vogel. In  Machine Translation Summit XII, 2009.
  • Experiments in Spanish-English and German-English Machine Translation of News Text; Francisco Guzman, Preslav Nakov, Ahmed Thabet, Stephan Vogel ,QCRI at WMT12: , WMT, 2012.
  • Word Alignment Revisited; Francisco Guzman, Qin Gao, Jan Niehues, Stephan Vogel, in Handbook of Natural    Language Processing and Machine Translation, edited by Joseph Olive, 2011.
  • EMDC: A Semi-supervised Approach for Word Alignment; Qin Gao, Francisco Guzman, Stephan Vogel, pp. 349-357, in Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). August 2010.
  • Reassessment of the Role of Phrase Extraction in SMT; Francisco Guzman, Qin Gao, Stephan Vogel, MT Summit XII, Ottawa, Canada. August 2009.
  • Translation Paraphrases for Phrase-Based Statistical Machine Translation. Computational Linguistics and Intelligent Text Processing; Francisco Guzman and Leonardo Garrido, 9th International Conference, CICLING 2008. Haifa, Israel. February 2008.


Follow Us

  • YouTube
  • Twitter
  • Facebook
  • RSS Feed
  • Linkedin
  • github-web.png
Back to Top

In the Media

Forbes fake news pic.jpg

Can AI Put An End To Fake News? Don't Be So Sure


Fake news was the Collin’s word of the year for 2017 with good reason. In a year where politics-as-usual was torn apart at the seams, high-profile scandals rocked our faith in humanity and the ...

Read More


MIT/QCRI system uses machine learning to build road maps


Map apps may have changed our world, but they still haven’t mapped all of it yet. Specifically, mapping roads can be difficult and tedious: even after taking aerial images, companies still have to ...

Read More

Economist story pic.JPG

Improving disaster response efforts through data


Extreme weather events put the most vulnerable communities at high risk. How can data analytics strengthen early warning systems and and support relief efforts for communities in need? The size and ...

Read More

Upcoming Events

Past Events


Eman interns pic 2017.jpg

QCRI Summer Internship Program

Download ICS File 06/05/2018  - 05/07/2018 , Hamad Bin Khalifa Research Complex

Each year, Qatar Computing Research Institute organizes a summer internship program for undergraduate students studying computer science, computer engineering and other disciplines. The internship is unpaid, and QCRI does not provide any visa support.

Read More


Public Talk by Prof. Regina Barzilay "Artificial Intelligence for Oncology: Learning to Cure Cancer from Images and Text"

Download ICS File 27/03/2018 ,

Artificial Intelligence for Oncology: Learning to Cure Cancer from Images and Text A talk by Professor Regina Barzilay, MIT CSAIL Winner of 2017 MacArthur ‘genius grant’ At Education City Student ...

Read More


QCRI & MIT-CSAIL Annual Project Review 2018

Download ICS File 27/03/2018 ,

Executive Overview Sessions Open to public Date:    Tuesday, March 27, 2018 Time:    9:00AM – 3:00PM Venue:  HBKU Research Complex Multipurpose Room To view full agenda, please click here . To RSVP, ...

Read More

News Releases

Darb Al Saai QCRI 2017.JPG

QCRI to offer kids’ computing activities at this year’s Darb Al Saai


Tech fun and robotics computing activities will be available to children attending the annual family celebration from December 12 to 20.

Read More

Sofiane Abbar in his office.jpg

Global experts in artificial intelligence for transportation to visit Qatar for TASMU-QCAI workshop


Urban computing experts from Europe, the US and Qatar are to discuss state-of-the-art advances in artificial intelligence for transportation with local stakeholders.

Read More

Francisco Martin - Source_TelefonicaOpenFuture_.jpg

QCAI to Conduct Joint Machine Learning School with BigML


Two-day crash course to provide hands-on introduction to machine learning for industry practitioners, developers, graduate students and undergraduates.

Read More