“Search and classify topics in a corpus of text using the latent dirichlet allocation model“

Iparraguirre-Villanueva, Orlando; Sierra-Liñan, Fernando; Herrera Salazar, Jose Luis; Beltozar-Clemente, Saul; Pucuhuayla-Revatta, Félix; Zapata-Paulin, Joselyn; Cabanillas-Carbonell, Michael

doi:10.11591/ijeecs.v30.i1.pp246-256

Publicación:
“Search and classify topics in a corpus of text using the latent dirichlet allocation model“

dc.contributor.author	Iparraguirre-Villanueva, Orlando
dc.contributor.author	Sierra-Liñan, Fernando
dc.contributor.author	Herrera Salazar, Jose Luis
dc.contributor.author	Beltozar-Clemente, Saul
dc.contributor.author	Pucuhuayla-Revatta, Félix
dc.contributor.author	Zapata-Paulin, Joselyn
dc.contributor.author	Cabanillas-Carbonell, Michael
dc.date.accessioned	2023-03-16T16:48:29Z
dc.date.available	2023-03-16T16:48:29Z
dc.date.issued	2022-11-18
dc.description.abstract	“This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 ““curriculum““ documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.“	es_ES
dc.format	application/pdf
dc.identifier.doi	10.11591/ijeecs.v30.i1.pp246-256	es_ES
dc.identifier.uri	https://hdl.handle.net/20.500.13053/8119
dc.language.iso	eng	es_ES
dc.publisher	Institute of Advanced Engineering and Science	es_ES
dc.publisher.country	ID	es_ES
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	"Classify Discovering Latent dirichlet allocation Text corpus Topics"	es_ES
dc.subject.ocde	http://purl.org/pe-repo/ocde/ford#1.02.01
dc.title	“Search and classify topics in a corpus of text using the latent dirichlet allocation model“	es_ES
dc.type	http://purl.org/coar/resource_type/c_6501
dc.type.driver	info:eu-repo/semantics/article
dc.type.local	info:eu-repo/semantics/publishedVersion
dc.type.version	http://purl.org/coar/version/c_970fb48d4fbd8a85
dspace.entity.type	Publication