Full-Time

Hybrid--San Francisco Bay Area, New York Metropolitan Area

Senior Data Engineer

Full-Time

Engineering

$190,000 - $250,000

Apply Now

Who We Are

PraxisPro, a data intelligence company, is on a mission to heal the fractured state of Life Sciences commercial data by surfacing undocumented and previously inaccessible datasets to drive novel commercial intelligence and improve patient outcomes. PraxisPro has begun surfacing undocumented and previously inaccessible data with the industry’s first purpose-built Learning Experience Platform (LXP).

By serving as the commercial intelligence backbone across commercial, medical, and compliance functions, our industry-specific AI models for therapeutic areas and disease states form the foundation for a new standard of commercial intelligence, one that enables disciplined execution at scale while allowing Life Sciences organizations to focus on what matters most: advancing patient outcomes.

The Role

We’re looking for a Senior Data Engineer (5–7 years) who is fluent in both streaming and batch paradigms on AWS or GCP. You’ll design and operate data platforms that power analytics, personalization, and recommendation use cases—partnering closely with ML engineers to move models from notebooks to production.

What You’ll Do

Design & build pipelines: Low-latency streaming and reliable batch ETL/ELT for multi-tenant datasets across AWS or GCP.
Own data quality: Implement contracts, validation, observability, lineage, backfills, and SLAs/SLOs.
Operationalize ML: Productionize features, embeddings, and model I/O for personalization/recommendation (feature stores, real-time inference paths, batch retraining).
Model the warehouse/lake: Create well-governed schemas (e.g., medallion/lakehouse patterns) to support BI and experimentation.
Harden & scale: Optimize cost/perf, implement autoscaling, partitioning, compaction, and tiered storage; champion reliability and incident response.
Security & compliance: Build with least-privilege IAM, encryption, PII handling, and auditability aligned to SOC 2 and healthcare data expectations.
Collaborate: Partner with product, ML, and app teams; contribute to data platform roadmap and coding standards.

Required Qualifications

5–7 years building and running production streaming + batch data pipelines.
Cloud: Expertise in AWS (Kinesis/MSK, Glue/EMR, Lambda, S3, Redshift) or GCP (Pub/Sub, Dataflow/Dataproc, GCS, BigQuery).
Polyglot engineering: Strong hands-on in Python plus one or more of Scala/Java/Go/TypeScript.
Distributed processing: Solid with Spark/Flink/Beam and related performance tuning (checkpointing, state, watermarking).
Orchestration & ELT: Airflow/Dagster and dbt or equivalent; CI/CD for data (tests, contracts).
ML-adjacent experience: Shipping data features for personalization/recs (e.g., candidate generation, ranking features, user/item embeddings, offline/online consistency).
Data foundations: Schema design, partitioning, CDC, late/duplicate data handling, idempotency, backfills.
Reliability: Monitoring/alerting, on-call familiarity, cost/perf optimization.
Communication: Clear written/spoken communication across engineering and product stakeholders.

Nice to Have

Feature stores (e.g., Feast), vector DBs (Qdrant, Pinecone, FAISS), or realtime retrieval for recs.
Event bus & contract tooling (Kafka + Protobuf/Avro, schema registry).
Data governance/lineage (OpenLineage, DataHub, Collibra or similar).
MLOps/model serving (Vertex AI, SageMaker, Ray Serve, Triton, custom microservices).
Infra-as-code (Terraform/CDK), containers (Docker/Kubernetes/ECS/GKE).
Experience with regulated data (HIPAA-adjacent), multi-tenant SaaS, and privacy-preserving analytics.
Experimentation/platform work for ranking systems (A/B testing, counterfactual logging).

Our Current Stack (illustrative)

AWS: S3, Kinesis/MSK, Glue/EMR, Lambda, Redshift; GCP: GCS, Pub/Sub, Dataflow/Dataproc, BigQuery
Processing: Spark, Flink, Beam; Transform: db
Orchestration: Airflow or Dagster; Contracts/Obs: Great Expectations/Deequ, OpenLineage
Serving: REST/gRPC services, Model inference endpoints; Storage: Postgres, Redis, vector stores (e.g., Qdrant)

Work Style & Hours

Remote-first U.S. team; preference for Pacific Time overlap (West Coast strongly preferred).
Collaboration via docs, async updates, and crisp incident/ops playbooks.How to Apply

If you are excited by the intersection of AI research and real-world product building, we’d love to hear from you.

How to Apply

Click "Apply Now" on this page, or email us at careers@praxispro.tech. We welcome candidates from all backgrounds and identities who share our passion for innovation and continuous learning.

PraxisPro is proud to be an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

We are not currently providing visa sponsorship for this position

Who We Are

The Role

What You’ll Do

Design & build pipelines: Low-latency streaming and reliable batch ETL/ELT for multi-tenant datasets across AWS or GCP.
Own data quality: Implement contracts, validation, observability, lineage, backfills, and SLAs/SLOs.
Operationalize ML: Productionize features, embeddings, and model I/O for personalization/recommendation (feature stores, real-time inference paths, batch retraining).
Model the warehouse/lake: Create well-governed schemas (e.g., medallion/lakehouse patterns) to support BI and experimentation.
Harden & scale: Optimize cost/perf, implement autoscaling, partitioning, compaction, and tiered storage; champion reliability and incident response.
Security & compliance: Build with least-privilege IAM, encryption, PII handling, and auditability aligned to SOC 2 and healthcare data expectations.
Collaborate: Partner with product, ML, and app teams; contribute to data platform roadmap and coding standards.

Required Qualifications

5–7 years building and running production streaming + batch data pipelines.
Cloud: Expertise in AWS (Kinesis/MSK, Glue/EMR, Lambda, S3, Redshift) or GCP (Pub/Sub, Dataflow/Dataproc, GCS, BigQuery).
Polyglot engineering: Strong hands-on in Python plus one or more of Scala/Java/Go/TypeScript.
Distributed processing: Solid with Spark/Flink/Beam and related performance tuning (checkpointing, state, watermarking).
Orchestration & ELT: Airflow/Dagster and dbt or equivalent; CI/CD for data (tests, contracts).
ML-adjacent experience: Shipping data features for personalization/recs (e.g., candidate generation, ranking features, user/item embeddings, offline/online consistency).
Data foundations: Schema design, partitioning, CDC, late/duplicate data handling, idempotency, backfills.
Reliability: Monitoring/alerting, on-call familiarity, cost/perf optimization.
Communication: Clear written/spoken communication across engineering and product stakeholders.

Nice to Have

Feature stores (e.g., Feast), vector DBs (Qdrant, Pinecone, FAISS), or realtime retrieval for recs.
Event bus & contract tooling (Kafka + Protobuf/Avro, schema registry).
Data governance/lineage (OpenLineage, DataHub, Collibra or similar).
MLOps/model serving (Vertex AI, SageMaker, Ray Serve, Triton, custom microservices).
Infra-as-code (Terraform/CDK), containers (Docker/Kubernetes/ECS/GKE).
Experience with regulated data (HIPAA-adjacent), multi-tenant SaaS, and privacy-preserving analytics.
Experimentation/platform work for ranking systems (A/B testing, counterfactual logging).

Our Current Stack (illustrative)

AWS: S3, Kinesis/MSK, Glue/EMR, Lambda, Redshift; GCP: GCS, Pub/Sub, Dataflow/Dataproc, BigQuery
Processing: Spark, Flink, Beam; Transform: db
Orchestration: Airflow or Dagster; Contracts/Obs: Great Expectations/Deequ, OpenLineage
Serving: REST/gRPC services, Model inference endpoints; Storage: Postgres, Redis, vector stores (e.g., Qdrant)

Work Style & Hours

Remote-first U.S. team; preference for Pacific Time overlap (West Coast strongly preferred).
Collaboration via docs, async updates, and crisp incident/ops playbooks.How to Apply

If you are excited by the intersection of AI research and real-world product building, we’d love to hear from you.

How to Apply

Click "Apply Now" on this page, or email us at careers@praxispro.tech. We welcome candidates from all backgrounds and identities who share our passion for innovation and continuous learning.

Related Jobs

Lead Fullstack Software Engineer

Full-Time

—

Remote

More Details

Lead Fullstack Software Engineer

Full-Time

—

Remote

More Details

Lead Fullstack Software Engineer

Full-Time

—

Remote

More Details

Unlock a New Level of Life Sciences Commercial Team Performance

Book a Demo

Unlock a New Level of Life Sciences Commercial Team Performance

Book a Demo

Unlock a New Level of Life Sciences Commercial Team Performance

Book a Demo