Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
-
Updated
Jun 26, 2024 - Python
Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
A high performance matcher for multiple functions and massive amounts of sensitive words matching.
Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
NLP toolkit for those nonsensical ontologies
Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
Projects I have worked during my Bachelor
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality
Add-on to the Orange3 data mining toolkit with text processing widgets from the project Navigating Stories
Conveniently access the REST API of Averbis products using Python
The Artificial Intelligent part of MiMFa Framework Project.
This repository houses a script that can download PDFs from a specified URL, convert them to text, and perform text analysis. This analysis includes identifying the language, eliminating stopwords, and counting word and phrase frequency. It's worth noting that the script is capable of analyzing texts in multiple languages.
This solution involves converting unstructured text data into meaningful insights. By leveraging ML and AI, the tasks of sentiment analysis, topic modelling, text classification, and net promoter score were developed.
Scientific Document Insight Q/A
Ein Parser für die Generation eines XML-TEI-Korpus der 20. Legislaturperiode des hessischen Landtags und die Berechnung eines Topic Models.
A complete sentiment analysis on Netflix's TV Series "Hospital Playlist" subtitles using NLTK's VADER and CardiffNLP's roBERTa from HuggingFace's Transformers.
Web scraping and sentiment analysis using multiprocesses, VADER and cardiffNLP's roBERTa from HuggingFace's transformers.
In the above 3 tasks we will study and investigate the proximity between 3 different groups of texts taken from different press sections. With reference to text mining, data cleaning, vector representation of rituals using various methods and performing various NLP tasks.
The code base of the front-end of nocodefunctions.com
Extension of the SentenceSimplification project
Examining gender representation in American diplomacy
Add a description, image, and links to the text-mining topic page so that developers can more easily learn about it.
To associate your repository with the text-mining topic, visit your repo's landing page and select "manage topics."