cleanNLP - A Tidy Data Model for Natural Language Processing
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or a Python back ends with 'spaCy' <https://spacy.io>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.
Last updated 6 months ago
corenlpnatural-language-processingspacy
8.85 score 212 stars 221 scripts 728 downloadsgenlasso - Path Algorithm for Generalized Lasso Problems
Computes the solution path for generalized lasso problems. Important use cases are the fused lasso over an arbitrary graph, and trend fitting of any given polynomial order. Specialized implementations for the latter two subproblems are given to improve stability and speed. See Taylor Arnold and Ryan Tibshirani (2016) <doi:10.1080/10618600.2015.1008638>.
Last updated 2 years ago
7.68 score 32 stars 6 packages 166 scripts 455 downloadsggimg - Graphics Layers for Plotting Image Data with 'ggplot2'
Provides two new layer types for displaying image data as layers within the Grammar of Graphics framework. Displays images using either a rectangle interface, with a fixed bounding box, or a point interface using a central point and general size parameter. Images can be given as local JPEG or PNG files, external resources, or as a list column containing raster image data.
Last updated 1 years ago
ggplot2-geomimage-analysis
5.70 score 53 stars 19 scripts 248 downloadstif - Text Interchange Format
Provides validation functions for common interchange formats for representing text data in R. Includes formats for corpus objects, document term matrices, and tokens. Other annotations can be stored by overloading the tokens structure.
Last updated 12 months ago
corpusnatural-language-processingterm-frequencytext-processingtokenizer
3.83 score 35 stars 13 scriptsctrialsgov - Query Data from U.S. National Library of Medicine's Clinical Trials Database
Tools to create and query database from the U.S. National Library of Medicine's Clinical Trials database <https://clinicaltrials.gov/>. Functions provide access a variety of techniques for searching the data using range queries, categorical filtering, and by searching for full-text keywords. Minimal graphical tools are also provided for interactively exploring the constructed data.
Last updated 3 years ago
3.46 score 29 scripts 207 downloadscoreNLP - Wrappers Around Stanford CoreNLP Tools
Provides a minimal interface for applying annotators from the 'Stanford CoreNLP' java library. Methods are provided for tasks such as tokenisation, part of speech tagging, lemmatisation, named entity recognition, coreference detection and sentiment analysis.
Last updated 2 years ago
3.02 score 1 stars 52 scripts 503 downloads