“Search and classify topics in a corpus of text using the latent dirichlet allocation model“

Iparraguirre-Villanueva, Orlando; Sierra-Liñan, Fernando; Herrera Salazar, Jose Luis; Beltozar-Clemente, Saul; Pucuhuayla-Revatta, Félix; Zapata-Paulin, Joselyn; Cabanillas-Carbonell, Michael

doi:10.11591/ijeecs.v30.i1.pp246-256

Publicación:
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“

Portada

631.14 KB

30256-61127-1-PB.pdf

PDF

FLIP

Citas bibliográficas

Gestores Bibliográficos

Indexadores

Código QR

Autores

Iparraguirre-Villanueva, Orlando

Sierra-Liñan, Fernando

Herrera Salazar, Jose Luis

Beltozar-Clemente, Saul

Pucuhuayla-Revatta, Félix

Zapata-Paulin, Joselyn

Cabanillas-Carbonell, Michael

Editores

Institute of Advanced Engineering and Science

Tipo de Material

info:eu-repo/semantics/article

Fecha

2022-11-18

Materias

"Classify Discovering Latent dirichlet allocation Text corpus Topics"

Resumen

“This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 ““curriculum““ documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.“

URI

https://hdl.handle.net/20.500.13053/8119

Identificador DOI

10.11591/ijeecs.v30.i1.pp246-256

Colecciones

SCOPUS

Página completa del ítem Ver Estadísticas de uso

Publicación:
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“

Portada

30256-61127-1-PB.pdf

Citas bibliográficas

Gestores Bibliográficos

Indexadores

Código QR

Autores

Autor corporativo

Recolector de datos

Otros/Desconocido

Director audiovisual

Editor/Compilador

Editores

Tipo de Material

Fecha

Materias

Cita bibliográfica

Título de serie/ reporte/ volumen/ colección

Es Parte de

Resumen

Descripción general

Notas

URL del Recurso

URI

Identificador ISBN

Identificador ISSN

Identificador DOI

Página de inicio

Es Parte del Libro

Colecciones

Publicación: “Search and classify topics in a corpus of text using the latent dirichlet allocation model“

Portada

30256-61127-1-PB.pdf

Citas bibliográficas

Gestores Bibliográficos

Indexadores

Código QR

Autores

Autor corporativo

Recolector de datos

Otros/Desconocido

Director audiovisual

Editor/Compilador

Editores

Tipo de Material

Fecha

Materias

Cita bibliográfica

Título de serie/ reporte/ volumen/ colección

Es Parte de

Resumen

Descripción general

Notas

URL del Recurso

URI

Identificador ISBN

Identificador ISSN

Identificador DOI

Página de inicio

Es Parte del Libro

Colecciones

Publicación:
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“