Main API Reference
This page is the flat reference for the public API at the package root. For task-oriented guides, start with Analyzing Text or Anonymizing Text instead.
Top-level exports
from allyanonimiser import (
# Main facade
Allyanonimiser,
# Configuration dataclasses
AnalysisConfig,
AnonymizationConfig,
# Low-level components (rarely needed directly)
EnhancedAnalyzer,
EnhancedAnonymizer,
CustomPatternDefinition,
PatternManager,
PatternRegistry,
# I/O processors
DataFrameProcessor,
StreamProcessor, # None if polars isn't installed
POLARS_AVAILABLE,
# Factories
create_allyanonimiser,
create_analyzer,
create_pattern_from_examples,
# spaCy model presets
SPACY_MODEL_FAST, # "en_core_web_sm"
SPACY_MODEL_ACCURATE, # "en_core_web_lg"
)
create_allyanonimiser(...)
Preferred entry point. Returns a fully configured Allyanonimiser
instance with all built-in Australian, general, and insurance patterns
loaded.
create_allyanonimiser(
pattern_filepath: str | None = None,
settings_path: str | None = None,
enable_caching: bool = True,
max_cache_size: int = 10_000,
spacy_model: str | None = "en_core_web_sm",
) -> Allyanonimiser
Arguments
| Name | Type | Default | Description |
|---|---|---|---|
pattern_filepath |
str \| None |
None |
Optional JSON file with extra CustomPatternDefinitions. |
settings_path |
str \| None |
None |
Optional YAML/JSON settings file. |
enable_caching |
bool |
True |
Cache analyze() results by text hash. |
max_cache_size |
int |
10_000 |
Max cached entries. |
spacy_model |
str \| None |
"en_core_web_sm" |
spaCy model name. Pass SPACY_MODEL_ACCURATE for en_core_web_lg. Pass None to disable spaCy (pattern-only mode). |
class Allyanonimiser
The main facade. Composes an analyzer, anonymizer, pattern registry, text preprocessor, and settings manager.
Core detection and anonymization
analyze(
text: str,
language: str = "en",
active_entity_types: list[str] | None = None,
score_adjustment: dict[str, float] | None = None,
min_score_threshold: float | None = None,
expand_acronyms: bool = False,
config: AnalysisConfig | None = None,
) -> list[RecognizerResult]
Detects PII in text. Returns a list of RecognizerResult objects with
entity_type, text, start, end, and score fields.
anonymize(
text: str,
operators: dict[str, str] | None = None,
language: str = "en",
active_entity_types: list[str] | None = None,
expand_acronyms: bool = False,
age_bracket_size: int = 5,
keep_postcode: bool = True,
config: AnonymizationConfig | None = None,
document_id: str | None = None,
report: bool = True,
) -> dict[str, Any]
Detects then rewrites text. Returns {"text": anonymized_text,
"items": [...]}. See Anonymizing Text
for the full operator catalogue.
process(text, ...) -> dict[str, Any]
Combined analyze + anonymize in one call, returning both the detected entities and the anonymized text.
batch_process(
texts: list[str],
content_types: list[str] | None = None,
analysis_config: AnalysisConfig | None = None,
anonymization_config: AnonymizationConfig | None = None,
**kwargs,
) -> list[dict[str, Any]]
Pattern management
| Method | Purpose |
|---|---|
add_pattern(pattern_definition) |
Register a single CustomPatternDefinition. |
create_pattern_from_examples(entity_type, examples, context=None, name=None, generalization_level="medium") |
Build a pattern from example strings and register it. |
load_patterns(filepath) |
Load patterns from a JSON file. |
save_patterns(filepath) |
Dump registered patterns to JSON. |
import_patterns_from_csv(csv_path, ...) |
Bulk-import from a CSV. |
get_available_entity_types() |
Dict of entity_type -> {count, patterns}. |
Acronym handling
| Method | Purpose |
|---|---|
set_acronyms(acronym_dict, case_sensitive=False) |
Replace the acronym dictionary. |
add_acronyms(acronym_dict) |
Merge into the existing dictionary. |
remove_acronyms(acronyms) |
Delete by key list. |
get_acronyms() |
Return the current dictionary. |
import_acronyms_from_csv(csv_path, ...) |
Bulk-import from a CSV. |
set_acronym_dictionary(...) is a legacy alias for set_acronyms.
DataFrames and files
| Method | Purpose |
|---|---|
process_dataframe(df, text_columns=...) |
Full detect+anonymize across one or more columns. See Working with DataFrames. |
anonymize_dataframe(df, column, **kwargs) |
Anonymize a single column. |
detect_pii_in_dataframe(df, column, **kwargs) |
Entity DataFrame only. |
detect_pii_columns(data, ...) |
Infer which columns likely contain PII (for schema discovery). |
process_csv_file(input_file, output_file=None, ...) |
Full-file CSV pipeline. |
process_csv_directory(input_dir, output_dir=None, ...) |
Recursively process every CSV under a directory. |
preview_csv_changes(input_file, ...) |
Dry-run on the first N rows. |
stream_process_csv(input_file, output_file, columns, ...) |
Chunked row-by-row for files too big to load. Requires polars (install with pip install "allyanonimiser[stream]"). |
process_files(file_paths, ...) |
Batch plain text files. |
Configuration and persistence
| Method | Purpose |
|---|---|
load_settings(settings_path) |
Load a YAML/JSON settings bundle. |
save_settings(settings_path) |
Dump current settings. |
export_config(config_path, include_metadata=True) |
Export patterns + acronyms + settings in one bundle. |
Reporting
| Method | Purpose |
|---|---|
start_new_report(session_id=None) |
Begin tracking a batch run. |
get_report(session_id=None) |
Retrieve a report. |
finalize_report(output_path=None, format="html") |
Render and optionally save. |
display_report_in_notebook(session_id=None) |
Jupyter-native rendering. |
Diagnostics
| Method | Purpose |
|---|---|
check_spacy_status() |
Dict with is_loaded, model_name, has_ner, entity_types, recommendation. Use this to diagnose model-install issues. |
explain_entity(text, entity) |
Return a dict explaining why a specific detection fired. |
class AnalysisConfig
Reusable analysis settings.
AnalysisConfig(
language: str = "en",
active_entity_types: list[str] | None = None,
score_adjustment: dict[str, float] | None = None,
min_score_threshold: float | None = None,
expand_acronyms: bool = False,
)
Pass via ally.analyze(text, config=cfg).
class AnonymizationConfig
Reusable anonymization settings.
AnonymizationConfig(
operators: dict[str, str] | None = None,
language: str = "en",
active_entity_types: list[str] | None = None,
expand_acronyms: bool = False,
age_bracket_size: int = 5,
keep_postcode: bool = True,
)
Pass via ally.anonymize(text, config=cfg).
RecognizerResult
Returned by analyze. Imported from
allyanonimiser.core.recognizer_result:
@dataclass
class RecognizerResult:
entity_type: str
start: int
end: int
score: float
text: str | None = None
class CustomPatternDefinition
Spec for registering a new entity type. Build via
create_pattern_from_examples(...) or pass directly:
from allyanonimiser import CustomPatternDefinition
pattern = CustomPatternDefinition(
entity_type="EMPLOYEE_ID",
patterns=[r"EMP\d{5}"],
context=["employee", "staff", "id"],
name="Employee ID",
score=0.85,
)
ally.add_pattern(pattern)
Further reading
- Analyzing Text — detection walkthrough
- Anonymizing Text — operator catalogue
- Working with DataFrames — pandas pipeline
- Custom Patterns — extending detection
- Anonymization Operators — per-operator reference
- Source on GitHub