This repository contains my MSc Artificial Intelligence project on weakly supervised video anomaly detection. The project adapts the UMIL framework to pre-extracted I3D video features and extends the binary anomaly detector with frozen CLIP text embeddings for semantic anomaly prediction.
- Reproduced a UMIL baseline on UCF-Crime using weak video-level supervision.
- Built
FeatureUMIL, a lightweight temporal MIL model for XD-Violence I3D feature sequences. - Added a CLIP-based semantic head using frozen text embeddings for anomaly type prediction.
- Evaluated anomaly detection with AUC/AP and semantic prediction with mAP/top-1 accuracy.
| Model | Dataset | Metric | Result |
|---|---|---|---|
| UMIL baseline | UCF-Crime | AUC@all | 83.28% |
| UMIL baseline | UCF-Crime | AUC@anomaly-only | 62.72% |
| FeatureUMIL | XD-Violence | coarse segment AP | 94.29% |
| FeatureUMIL | XD-Violence | coarse segment AUC | 94.47% |
| FeatureUMIL + CLIP text | XD-Violence | coarse segment AP | 94.19% |
| FeatureUMIL + CLIP text | XD-Violence | semantic mAP | 65.50% |
| FeatureUMIL + CLIP text | XD-Violence | semantic top-1 | 76.00% |
The XD-Violence metrics are coarse segment-level project metrics produced from pre-extracted I3D features and repeated video-level labels. They should not be compared directly with official frame-level XD-Violence leaderboard metrics.
UMIL-main/: training and evaluation code.UMIL-main/main_umil.py: UCF-Crime UMIL baseline workflow.UMIL-main/main_umil_features.py: XD-Violence I3D FeatureUMIL workflow.UMIL-main/models/feature_umil.py: temporal feature model and CLIP semantic projection.UMIL-main/datasets/feature_dataset.py: XD-Violence I3D feature dataset loader.UMIL-main/tools/: dataset preparation, validation, and plotting utilities.docs/: final report and architecture figure.results/summary/: compact verified result summaries.
Datasets and model checkpoints are not included. To run the code, place the required datasets/features locally and pass paths through command-line arguments or environment variables.
Expected external assets:
- UCF-Crime frames for the UMIL baseline.
- XD-Violence pre-extracted I3D RGB features.
k400_32_8.pthor equivalent pretrained video backbone weights.
cd UMIL-main
# XD-Violence FeatureUMIL + CLIP text labels
FEATURE_ROOT=data/i3d-features bash tools/run_xd_i3d_text_umil_tmux.sh
# UCF-Crime UMIL baseline
DATA_ROOT=data/UCF PRETRAINED=k400_32_8.pth bash tools/run_umil_ucf.shThis repository is prepared as a clean public research artifact. Raw datasets, full checkpoints, raw pickle outputs, and remote directories are intentionally excluded.