"Search and classify topics in a corpus of text using the latent dirichlet allocation model"

Mostrar el registro sencillo del ítem

dc.contributor.author Iparraguirre-Villanueva, Orlando es_ES
dc.contributor.author Sierra-Liñan, Fernando es_ES
dc.contributor.author Herrera Salazar, Jose Luis es_ES
dc.contributor.author Beltozar-Clemente, Saul es_ES
dc.contributor.author Pucuhuayla-Revatta, Félix es_ES
dc.contributor.author Zapata-Paulin, Joselyn es_ES
dc.contributor.author Cabanillas-Carbonell, Michael es_ES
dc.date.accessioned 2023-03-16T16:48:29Z
dc.date.available 2023-03-16T16:48:29Z
dc.date.issued 2022-11-18
dc.identifier.uri https://hdl.handle.net/20.500.13053/8119
dc.description.abstract "This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 ""curriculum"" documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology." es_ES
dc.format application/pdf es_ES
dc.language.iso eng es_ES
dc.publisher Institute of Advanced Engineering and Science es_ES
dc.rights info:eu-repo/semantics/openAccess es_ES
dc.rights.uri https://creativecommons.org/licenses/by/4.0/ es_ES
dc.subject "Classify Discovering Latent dirichlet allocation Text corpus Topics" es_ES
dc.title "Search and classify topics in a corpus of text using the latent dirichlet allocation model" es_ES
dc.type info:eu-repo/semantics/article es_ES
dc.identifier.doi 10.11591/ijeecs.v30.i1.pp246-256 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.publisher.country ID es_ES
dc.subject.ocde http://purl.org/pe-repo/ocde/ford#1.02.01 es_ES


Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

info:eu-repo/semantics/openAccess Excepto si se señala otra cosa, la licencia del ítem se describe como info:eu-repo/semantics/openAccess

Buscar en DSpace


Listar

Mi cuenta

Estadísticas