GenomeHouse Logo

GenomeHouse

Next-Gen Tools for Next-Genomics

Modular bioinformatics toolkit for sequence analysis, parsing, ML, and visualization

pip install genomehouse
Latest: v1.2.0
Python ≥3.8
MIT License

Powerful Features

Everything you need for bioinformatics research

Sequence Analysis

Comprehensive tools for DNA/RNA sequence manipulation including reverse complement, motif search, GC content calculation, and translation.

  • • Reverse complement generation
  • • Motif pattern searching
  • • GC content analysis
  • • Sequence translation

Data Parsing

Robust parsers for standard bioinformatics file formats with efficient memory usage and error handling.

  • • FASTA/FASTQ parsing
  • • VCF file processing
  • • GFF/GTF annotation
  • • Memory-efficient streaming

Machine Learning

Pre-built ML pipelines optimized for biological data analysis with feature extraction and model training.

  • • Feature extraction pipelines
  • • Classification models
  • • Clustering algorithms
  • • Cross-validation tools

Visualization

Publication-quality plots and charts specifically designed for genomic and biological data presentation.

  • • Sequence alignment plots
  • • Phylogenetic trees
  • • Statistical charts
  • • Interactive visualizations

Statistical Analysis

Comprehensive statistical tools for hypothesis testing, correlation analysis, and data exploration.

  • • Hypothesis testing
  • • Correlation analysis
  • • Distribution fitting
  • • Significance testing

Extensible API

User-friendly, modular design that allows easy extension and customization for specific research needs.

  • • Modular architecture
  • • Plugin system
  • • Custom workflows
  • • Easy integration

Quick Installation

Get started in seconds

Terminal
$ pip install genomehouse

Requirements

  • • Python ≥3.8
  • • NumPy, Pandas
  • • Matplotlib, Seaborn
  • • Scikit-learn

Verification

>>> import genomehouse
>>> genomehouse.__version__
'1.2.0'

Code Examples

See GenomeHouse in action

Sequence Analysis

from genomehouse import sequence_tools
# Calculate GC content
seq = "ATGCGTACGGCTA"
gc_content = sequence_tools.gc_content(seq)
print(f"GC Content: {gc_content}%")
# Get reverse complement
rev_comp = sequence_tools.reverse_complement(seq)
print(f"Reverse Complement: {rev_comp}")
# Find motifs
motifs = sequence_tools.find_motifs(seq, "GC")
print(f"GC motifs found at: {motifs}")

File Parsing

from genomehouse import genomic_parsers
# Parse FASTA file
for header, sequence in genomic_parsers.parse_fasta("data.fasta"):
    print(f">{header}")
    print(f"Length: {len(sequence)}")
    print(f"GC%: {sequence_tools.gc_content(sequence)}")
# Parse FASTQ with quality scores
for record in genomic_parsers.parse_fastq("reads.fastq"):
    print(f"ID: {record.id}")
    print(f"Quality: {record.quality_score}")

Machine Learning

from genomehouse import ml_tools
# Feature extraction from sequences
sequences = ["ATGCGT", "GCATGC", "TGCATG"]
features = ml_tools.extract_features(sequences)
# Train classification model
model = ml_tools.SequenceClassifier()
model.fit(features, labels)
# Predict new sequences
predictions = model.predict(new_sequences)
print(f"Predictions: {predictions}")

CLI Usage

# Parse FASTA file
$ genomehouse-cli parse-fasta data/sample.fasta
# Calculate GC content
$ genomehouse-cli gc-content ATGCGTAC
GC Content: 50.0%
# Convert FASTQ to FASTA
$ genomehouse-cli convert reads.fastq output.fasta
# Generate sequence statistics
$ genomehouse-cli stats genome.fasta

Documentation & Resources