Estudio de la complejidad del Español para la simplificación textual

Randall   Araya Camposa; Paula  Estrella; José  Arguedas Castillo; Walter  Alvarez Grijalba

doi:10.18845/tm.v33i7.5478

PDF (Español (España))

Published: Nov 11, 2020

DOI: https://doi.org/10.18845/tm.v33i7.5478

Keywords:

Readability, complexity metrics, textual simplification, evaluation

Randall Araya Camposa

Escuela de Ingeniería en Computación. Centro Académico de Alajuela. Instituto Tecnológico de Costa Rica

Paula Estrella

Facultad de Lenguas y FaMAF. Universidad Nacional de Córdoba Córdoba

José Arguedas Castillo

Escuela de Ingeniería en Computación. Centro Académico de Alajuela. Instituto Tecnológico de Costa Rica

Walter Alvarez Grijalba

Escuela de Ingeniería en Computación. Centro Académico de Alajuela. Instituto Tecnológico de Costa Rica

Abstract

Most of the work in the area of textual simplification is done on the language
English for having more linguistic resources and about the journalistic genre. But nevertheless,
Due to our context, in this work we concentrate on studying and automating the metrics
existing to measure lexical complexity for Spanish, as a previous step to identifying
of complex sentences and their subsequent simplification. Another novel aspect of this work
is the use of corpus related to human rights, specifically of the
Organization for the United Nations and the United Nations High Commissioner for
Refugees. The most significant contributions are: the creation of a code tool
open, which generates a report on the complexity of a given text in order to support
to anyone interested in simplifying that text, and the proposal of a new metric to measure
complexity in a multifaceted way. The results obtained in the different experiments
carried out are promising and in many cases confirm the hypotheses raised.

How to Cite

Araya Camposa, R. . ., Estrella, P. ., Arguedas Castillo, J. ., & Alvarez Grijalba, W. . (2020). Study of the complexity of Spanish for textual simplification. Tecnología En Marcha Journal, 33(7), Pág. 45–63. https://doi.org/10.18845/tm.v33i7.5478

Issue

2020: Vol. 33 especial. Movilidad Estudiantil 8

Section

Artículo científico

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.

References

Wikipedia, “Derecho de acceso a la información — wikipedia, la enciclopedia libre,” 2018, [Internet; descargado 21-marzo-2018].

G. Paetzold and L. Specia, “Semeval 2016 task 11: Complex word identification,” in Proceedings of the 10th

International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 560–569.

R. Chandrasekar, C. Doran, and B. Srinivas, “Motivations and methods for text simplification,” in Proceedings of

the 16th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1996,

pp. 1041–1044.

R. Chandrasekar and B. Srinivas, “Automatic induction of rules for text simplification1,” Knowledge-Based

Systems, vol. 10, no. 3, pp. 183–190, 1997.

M. Dras, “Tree adjoining grammar and the reluctant paraphrasing of text,” Ph.D. dissertation, Macquarie

University Sydney, 1999.

J. Carroll, G. Minnen, Y. Canning, S. Devlin, and J. Tait, “Practical simplification of english newspaper text to

assist aphasic readers,” in Proceedings of the AAAI-98 Workshop on Integrating Artificial Intelligence and

Assistive Technology, 1998, pp. 7–10.

C. Fellbaum, WordNet. Wiley Online Library, 1998.

P. T. Quinlan, The Oxford psycholinguistic database. University Press, 1992.

W. Hwang, H. Hajishirzi, M. Ostendorf, and W. Wu, “Aligning sentences from standard wikipedia to simple

wikipedia,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for

Computational Linguistics: Human Language Technologies, 2015, pp. 211–217.

Z. Zhu, D. Bernhard, and I. Gurevych, “A monolingual tree-based translation model for sentence simplification,”

in Proceedings of the 23rd international conference on computational linguistics. Association for Computational

Linguistics, 2010, pp. 1353–1361.

K. Yamada and K. Knight, “A syntax-based statistical translation model,” in Proceedings of the 39th Annual

Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2001, pp.

–530.

R. Flesch, “A new readability yardstick.” Journal of applied psychology, vol. 32, no. 3, p. 221, 1948.

K. Woodsend and M. Lapata, “Wikisimple: Automatic simplification of wikipedia articles.” in Aaai, 2011.

S. Wubben, A. Van Den Bosch, and E. Krahmer, “Sentence simplification by monolingual machine translation,”

in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long PapersVolume 1. Association for Computational Linguistics, 2012, pp. 1015–1024.

P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran,

R. Zens et al., “Moses: Open source toolkit for statistical machine translation,” in Proceedings of the 45th

annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational

Linguistics, 2007, pp. 177– 180.

S. M. Aluísio, L. Specia, T. A. Pardo, E. G. Maziero, and R. P. Fortes, “Towards brazilian portuguese automatic

text simplification systems,” in Proceedings of the Eighth ACM Symposium on Document Engineering, ser.

DocEng ’08. New York, NY, USA: ACM, 2008, pp. 240–248.

H. Saggion, E. Gómez-Martínez, E. Etayo, A. Anula, and L. Bourg, “Text simplification in simplext: Making texts

more accessible,” Procesamiento del lenguaje natural, no. 47, pp. 341–342, 2011.

A. Candido Jr, E. Maziero, C. Gasperin, T. A. Pardo, L. Specia, and M. Aluisio, “Supporting the adaptation of

texts for poor literacy readers: a text simplification editor for brazilian portuguese,” in Proceedings of the Fourth

Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational

Linguistics, 2009, pp. 34–42.

S. Bott, L. Rello, B. Drndarevic, and H. Saggion, “Can spanish be simpler? lexsis: Lexical simplification for

spanish,” Proceedings of COLING 2012, pp. 357–374, 2012.

B. Drndarevic´, S. Štajner, S. Bott, S. Bautista, and H. Saggion, “Automatic text simplification in spanish: a

comparative evaluation of complementing modules,” in International Conference on Intelligent Text Processing

and Computational Linguistics. Springer, 2013, pp. 488– 500.

S. Bautista and H. Saggion, “Can numerical expressions be simpler? implementation and demostration of a

numerical simplification system for spanish.” in LREC, 2014, pp. 956–962.

S. B. Blasco, “Un modelo computacional para la simplificación automática de expresiones numéricas,” 2015.

M. Shardlow, “A survey of automated text simplification,” International Journal of Advanced Computer Science

and Applications, vol. 4, no. 1, pp. 58–70, 2014.

M. Zampieri, S. Malmasi, G. Paetzold, and L. Specia, “Complex word identification: Challenges in data

annotation and system performance,” in Proceedings of the 4th Workshop on Natural Language Processing

Techniques for Educational Applications (NLPTEA 2017), 2017, pp. 59– 63.

A. Saint-Exupéry, El principito, 2003.

A. Eisele and Y. Chen, “Multiun: A multilingual corpus from united nation documents,” in Proceedings of the

Seventh conference on International Language Resources and Evaluation, D. Tapias, M. Rosner, S. Piperidis,

J. Odjik, J. Mariani, B. Maegaard, K. Choukri, and N. C. C. Chair), Eds. European Language Resources

Association (ELRA), 5 2010, pp. 2868–2872.

S. Spaulding, “A spanish readability formula,” The Modern Language Journal, vol. 40, no. 8, pp. 433–441,

A. Anula, “Tipos de textos, complejidad lingüıstica y facilicitación lectora,” in Actas del Sexto Congreso de

Hispanistas de Asia, 2007, pp. 45–61.

M. L. Forcada, M. Ginestí-Rosell, J. Nordfalk, J. O’Regan, S. OrtizRojas, J. A. Pérez-Ortiz, F. Sánchez-Martínez,

G. Ramírez-Sánchez, and F. M. Tyers, “Apertium: a free/open-source platform for rule-based machine translation,” Machine translation, vol. 25, no. 2, pp. 127–144, 2011.

J. L. Fleiss, “Measuring nominal scale agreement among many raters.” Psychological bulletin, vol. 76, no. 5,

p. 378, 1971.

Article Sidebar

Main Article Content

Abstract

Article Details

References