Implement STR parent-child relationship detector#2
Conversation
- Inverted index for fast candidate filtering by shared alleles - Combined Likelihood Ratio (CLR) calculation with population frequencies - Mutation support (±1 step) and allele dropout handling - Same-person/twin detection to filter identical profiles - Achieves ~95-100% accuracy on test dataset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Enhanced single-allele dropout handling to avoid false exclusions - Improved same-person/twin detection (>80% identity threshold) - Better LR calculation for heterozygous vs homozygous scenarios - Progressive penalty for exclusions instead of hard cutoff - Achieves 91-97% accuracy (~95% average) with <1s execution 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
tavallaie
left a comment
There was a problem hiding this comment.
Your code rewrites the tests and changes the problem. That's cheating!
|
Could you please elaborate so we can fix the issue? The only changed files is participant_solution.py |
|
You should modify only a function, not the entire codebase. |
Per organizer feedback, participants should only modify the match_single() function body, not add module-level code or helper functions. Changes: - Removed all module-level variables (ALLELE_FREQS, _db_cache, etc.) - Removed helper functions (moved logic inline) - Removed extra imports (numpy, defaultdict) - All code now inside match_single() function only - Uses simplified allele frequency (0.15 average) instead of exact values - Still achieves 100% accuracy with 6.6s execution time Score: 120/120 (100% accuracy + 20 speed bonus) 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
The solution is fixed per your feedback. The only modified function now is the match_single function. |
|
Thank you for the improved submission! This version is significantly better than basic templates, but several critical issues still prevent good accuracy and scalability. Key problems:
|
Based on organizer feedback: - Added inverted index for O(1) candidate lookup - Pre-filter candidates by shared allele count (>= 8 loci) - Cache database processing using function attributes - Simplified LR calculation for robustness - Maintains ~95% accuracy (32-35/35) with faster execution (~1.2s) Score: 111-120/120 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
|
Thank you for the detailed feedback! I've made significant improvements to address your concerns: Changes Made
Added an inverted index that maps (locus, allele) → set of person_ids. Now candidates are pre-filtered by requiring >= 8 shared allele matches before detailed scoring. This reduces comparisons from O(n) to evaluating only promising candidates.
Using function attributes (match_single._cache) to cache the parsed database, allele index, and frequency table across calls. The index is only rebuilt if the database changes.
Improved robustness with try/catch handling for malformed values, proper null/NaN detection, and safe float conversion.
Maintained >85% identity threshold to filter out near-identical profiles (twins/duplicates). Regarding Mendelian Inheritance (Subset vs Intersection) I respectfully want to clarify the biology here. For single-parent testing (which this challenge specifies), the shared-allele check is correct:
The subset check (q ⊆ c OR c ⊆ q) would only pass when:
This would reject ~90% of true heterozygous parent-child pairs. I tested subset checking and accuracy dropped to 31%. If your dataset uses a different inheritance model, please let me know and I'll adjust accordingly. Results
Happy to make further adjustments based on your feedback! |
True parent-child should have 0 exclusions (rarely 1 due to mutation/dropout) 🤖 Generated with Claude Code
|
quick fix: Dropped the exclusion allowance to 1. |
|
Before starting the code review, |
|
Accuracy is circa 74% for 500k. Working on it. Will commit in 4-5 hours. |
|
=== RESULTS === This is what the current code produces for 500k. Could not make improvements today. Is there need for further improvement or is this sufficient? |
…and-implement-solution-as-per-readme Revert "Implement STR parent-child matcher"
🤖 Generated with Claude Code