Installation
Allyanonimiser can be installed using pip with various installation options depending on your needs.
Prerequisites
- Python 3.12 or 3.13 (spaCy does not yet ship cp314 wheels)
- A spaCy language model (recommended)
Basic Installation
Install the core package from PyPI:
pip install allyanonimiser==3.3.0
Installation Options
Allyanonimiser offers several installation options to meet different needs:
With Stream Processing Support
For processing very large files with memory-efficient streaming:
pip install "allyanonimiser[stream]==3.3.0"
With LLM Integration
For advanced pattern generation using language models:
pip install "allyanonimiser[llm]==3.3.0"
Complete Installation
To install all optional dependencies:
pip install "allyanonimiser[stream,llm]==3.3.0"
Installing a spaCy Language Model
Allyanonimiser uses spaCy for NER (PERSON, LOCATION, ORG). Two models are supported; pattern-based detection (TFN, ABN, MEDICARE, AU_PHONE, EMAIL, dates, etc.) is identical with either one.
Default: small model
python -m spacy download en_core_web_sm # 44 MB, fast
This is what create_allyanonimiser() loads by default in v3.3+. It's
the right choice for serverless deployments (Azure Functions, Lambda)
and for pipelines where pattern PII (TFN, ABN, etc.) is the primary
concern.
Optional: large model (higher NER accuracy)
python -m spacy download en_core_web_lg # 587 MB, ~1.5 GB resident
Pass it explicitly when you need higher recall on names, places, and organisations:
from allyanonimiser import create_allyanonimiser, SPACY_MODEL_ACCURATE
ally = create_allyanonimiser(spacy_model=SPACY_MODEL_ACCURATE)
Choosing a spaCy model
SPACY_MODEL_FAST (en_core_web_sm) |
SPACY_MODEL_ACCURATE (en_core_web_lg) |
|
|---|---|---|
| Default in v3.3+? | yes | no |
| Download / disk | 44 MB | 587 MB |
| Resident memory | ~200 MB | ~1.5 GB |
| Cold start | ~0.5s | 2 – 5s |
| Pattern detection | identical | identical |
PERSON recall |
~80% on insurance text | ~92% |
LOCATION, ORG recall |
noticeably worse | high |
Pass spacy_model=None to disable spaCy entirely — pattern detection
keeps working.
Development Installation
If you're contributing to the project, install in development mode:
# Clone the repository
git clone https://github.com/srepho/Allyanonimiser.git
cd Allyanonimiser
# Install in development mode
pip install -e .
# Install development dependencies
pip install -e ".[dev]"
Verifying Installation
To verify that Allyanonimiser was installed correctly, run:
import allyanonimiser
print(allyanonimiser.__version__)
This should print the current version of Allyanonimiser (e.g., 3.3.0).