What is an example of an EM algorithm


Programming basics

Many approaches to natural language processing (nlp) are based on probabilities estimated from sample data. An important estimation method for this is the expectation maximization algorithm (EM algorithm) [DLR77]. With it it is possible to calculate probabilities based on incomplete data to train.

In the context of nlp, for example, it is important to be able to determine rule probabilities of probabilistic context-free grammars (pcfg) on ​​the basis of data that only contain terminal strings. The derivatives with which the terminal strings were generated are unknown, so that the rule probabilities cannot be calculated directly; the data are therefore incomplete in this sense. With the help of the EM algorithm, reasonable control probabilities can nevertheless be estimated.

The EM algorithm is a very general technique and is applicable to many other problems. In this way, various specific instances of the EM algorithm can be derived from the general version, e.g. training methods for

  1. probabilistic context-free grammars [LY90; NS08] (further: [Bak79; Pre01a; Pre01b]),
  2. tree transducers [GK04; GKM08],
  3. IBM Model 1 [Bro + 93],
  4. synchronous tree-adjoining grammars [BVN14],
  5. Yamada-Knight model [YK01],
  6. extended multi bottom-up tree transducers [ELM09],
  7. interpreted regular tree grammars [KK11],
  8. synchronous context-free grammars [Chi07].

Some of these instances will be derived in the lecture.


  • Thursdays, 5th DS (2: 50–4: 8 pm), APB / E009: Lecture
  • Tuesdays, 1st DS (7:30 am - 9:00 am), APB / E010: Exercise

Lecture material

  • Overview article on statistical machine translation [Lop08].
  • EM algorithms in literature slides
  • ⦇P⦈cb is nondecreasing proof
  • Algorithmic skeleton for EM, Comparison and Relationship of EM algorithms folios
  • ⦇Κ⦈sc = ⦇P⦈cbproof
  • Overview on EM algorithms
  • ⦇Μ⦈ok = ⦇Μscproof
  • Generic inside-outside EM algorithm slide
  • Training of PCFG




J. K. Baker. Trainable grammars for speech recognition. The Journal of the Acoustical Society of America 65, S1 (1979), S132-S132. doi: 10.1121 / 1.2017061.

[Bro + 93]

Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19.2 (June 1993), 263-311. issn: 0891-2017.


Matthias Büchse, Heiko Vogler and Mark-Jan Nederhof. Tree parsing for tree-adjoining machine translation. Journal of Logic and Computation 24.2 (2014), 351-373. doi: 10.1093 / logcom / exs050.


David Chiang. Hierarchical Phrase-Based Translation. Computational Linguistics 33.2 (June 2007), 201-228. issn: 0891-2017. doi: 10.1162 / coli.2007.33.2.201.


A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39.1: 1-38 (1977). issn: 00359246.


Joost Engelfriet, Eric Lilin and Andreas Maletti. Extended multi bottom-up tree transducers. Acta Informatica 46.8 (2009): 561-590. issn: 0001-5903. doi: 10.1007 / s00236-009-0105-8.


Jonathan Graehl and Kevin Knight. Training Tree Transducers. HLT-NAACL 2004: Main Proceedings. Edited by Susan Dumais, Daniel Marcu and Salim Roukos. Boston, Massachusetts, USA: Association for Computational Linguistics, May 2004, 105-112.


Jonathan Graehl, Kevin Knight and Jonathan May. Training Tree Transducers. Computational Linguistics 34.3 (June 2008), 391-427. issn: 0891-2017. doi: 10.1162 / coli.2008.07-051-R2-03-57.


Alexander Koller and Marco Kuhlmann. A generalized view on parsing and translation. Proceedings of the 12th International Conference on Parsing Technologies. IWPT '11. Dublin, Ireland: Association for Computational Linguistics, 2011, 2–13. isbn: 9781932432046.


Adam Lopez. Statistical Machine Translation. ACM Comput. Surv. 40.3 (Aug 2008), 8: 1-8: 49. issn: 0360-0300. doi: 10.1145 / 1380584.1380586.


Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech & Language 4.1 (Jan 1990), 35-56. issn: 0885-2308. doi: 10.1016 / 0885-2308 (90) 90022-X.


Mark-Jan Nederhof and Giorgio Satta. Probabilistic parsing. New Developments in Formal Languages ​​and Applications. Edited by Gemma Bel-Enguix, M.Dolores Jiménez-López and Carlos Martín-Vide. Vol. 113. Studies in Computational Intelligence. Springer Berlin Heidelberg, 2008, 229-258. isbn: 9783540782902. doi: 10.1007 / 978-3-540-78291-9_7.


Detlef Prescher. Inside-Outside Estimation Meets Dynamic EM. Proceedings of the Seventh International Workshop on Parsing Technologies (IWPT-2001), October 17-19, 2001, Beijing, China. Tsinghua University Press, 2001. isbn: 7302049254.


Detlef Prescher. Inside-outside estimation meets dynamic EM: gold. Techn. Ber. P.O. Box 151141, 66041 Saarbrücken: Saarland University and State Library, 2001.


Kenji Yamada and Kevin Knight. A syntax-based statistical translation model. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. ACL ’01. Toulouse, France: Association for Computational Linguistics, 2001, 523-530. doi: 10.3115 / 1073012.1073079.

Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged
Heiko Vogler

Tel .: +49 (0) 351 463-38232
Fax: +49 (0) 351 463-37959
Email contact form

Kilian Gebhardt

Tel .: +49 (0) 351 463-38237
Fax: +49 (0) 351 463-37959
Email contact form