Watch

No description

Python 100%

Find a file

Repository files (latest commit first)
Filename	Latest commit message	Latest commit date
François Pelletier 6cce0eb9fa chore: add project URLs with repository link to pyproject.toml		2026-06-15 22:06:01 -04:00
lcm_nlp	feat: initial release v1.2.0 of lcm-nlp NLP library	2026-06-15 22:01:06 -04:00
.gitignore	feat: initial release v1.2.0 of lcm-nlp NLP library	2026-06-15 22:01:06 -04:00
LICENSE	feat: initial release v1.2.0 of lcm-nlp NLP library	2026-06-15 22:01:06 -04:00
pyproject.toml	chore: add project URLs with repository link to pyproject.toml	2026-06-15 22:06:01 -04:00
README.md	feat: initial release v1.2.0 of lcm-nlp NLP library	2026-06-15 22:01:06 -04:00

README.md

lcm-nlp — Libère tes chaînes de mots

Bibliothèque Python pédagogique pour le traitement automatique du langage naturel (NLP), conçue pour accompagner la formation Libère tes chaînes de mots.

Installation

pip install lcm-nlp

Modules

Module	Description
`regex_utils`	Expressions régulières et automates (DFA)
`preprocessing`	Tokenisation, normalisation, stemming, distance d'édition
`ngrams`	Modèles de langue N-grammes (lissage Laplace, interpolation)
`classification`	Classification Naive Bayes, sac de mots, TF-IDF
`evaluation`	Métriques d'évaluation (précision, rappel, F1, validation croisée)
`ner`	Reconnaissance d'entités nommées (règles, IOB)
`embeddings`	Plongements de mots (cooccurrence, SVD, similarité cosinus)
`search`	Moteur de recherche textuelle avec TypeSense
`corpus_loader`	Chargement du corpus Pleine Confiance (cybersécurité)
`emoji_analysis`	Analyse d'emojis dans les textes
`sentence_analysis`	Analyse de phrases (POS, lisibilité, complexité)
`text_reuse`	Réutilisation de contenu, LDA, phrases clés
`linkedin`	Chargement et analyse de données LinkedIn

Utilisation rapide

from lcm_nlp.preprocessing import tokenize, remove_stopwords
from lcm_nlp.classification import NaiveBayesClassifier

# Tokenisation
tokens = tokenize("Le traitement du langage naturel est fascinant.", method="words_only")
tokens = remove_stopwords(tokens)
print(tokens)
# → ['traitement', 'langage', 'naturel', 'fascinant']

# Classification
clf = NaiveBayesClassifier()
clf.train([
    (["excellent", "film"], "positif"),
    (["mauvais", "nul"], "négatif"),
])
print(clf.predict(["superbe", "film"]))  # → "positif"

Licence

MIT