---
title: "Quick Start Guide"
author: 
  - name: "Zaoqu Liu"
    email: "liuzaoqu@163.com"
    affiliation: "Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University"
    orcid: "0000-0002-0452-742X"
  - name: "Aimin Xie"
    email: "aiminyy1993@gmail.com"
    affiliation: "Original Author"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Quick Start Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  warning = FALSE,
  message = FALSE
)
```

# Introduction

**scPAS** (Single-Cell Phenotype-Associated Subpopulation identifier) is a computational tool designed to identify cell subpopulations associated with phenotypes by integrating single-cell RNA-seq data with bulk transcriptomics data.

## Key Features

- **Multi-modal Integration**: Combines single-cell and bulk RNA-seq data
- **Multiple Phenotype Types**: Supports continuous, binary, and survival phenotypes
- **Network Regularization**: Leverages gene-gene similarity networks
- **Statistical Rigor**: Permutation-based significance testing with FDR correction

## Package Installation

```{r install, eval=FALSE}
# Install from GitHub
if (!require("devtools")) install.packages("devtools")
devtools::install_github("Zaoqu-Liu/scPAS")

# Install dependencies if needed
if (!require("BiocManager")) install.packages("BiocManager")
BiocManager::install("preprocessCore")
```

# Quick Example

## Load Required Packages

```{r load-packages}
library(scPAS)
library(Matrix)
library(Seurat)
```

## Simulate Example Data

For this quick start, we'll create simulated data to demonstrate the workflow:

```{r simulate-data}
set.seed(42)

# Simulate bulk RNA-seq data (500 genes x 50 samples)
n_genes <- 500
n_bulk_samples <- 50
n_cells <- 200

bulk_data <- matrix(
  rpois(n_genes * n_bulk_samples, lambda = 100),
  nrow = n_genes,
  ncol = n_bulk_samples
)
rownames(bulk_data) <- paste0("Gene", 1:n_genes)
colnames(bulk_data) <- paste0("Sample", 1:n_bulk_samples)

# Add log transformation
bulk_data <- log2(bulk_data + 1)

# Simulate single-cell data (same genes x 200 cells)
sc_counts <- matrix(
  rpois(n_genes * n_cells, lambda = 5),
  nrow = n_genes,
  ncol = n_cells
)
rownames(sc_counts) <- paste0("Gene", 1:n_genes)
colnames(sc_counts) <- paste0("Cell", 1:n_cells)

# Create Seurat object
sc_obj <- CreateSeuratObject(
  counts = sc_counts,
  project = "QuickStart"
)

# Add cell type labels
sc_obj$celltype <- sample(
  c("TypeA", "TypeB", "TypeC"),
  n_cells,
  replace = TRUE
)

# Simulate phenotype (continuous)
phenotype <- rnorm(n_bulk_samples, mean = 50, sd = 10)
names(phenotype) <- colnames(bulk_data)
```

## Preprocess Single-Cell Data

Use the built-in `run_Seurat()` function for standard preprocessing:

```{r preprocess}
# Standard Seurat preprocessing
sc_obj <- run_Seurat(sc_obj, verbose = FALSE)

# Check the result
sc_obj
```

## Run scPAS Analysis

```{r run-scpas}
# Run scPAS with Gaussian family (continuous phenotype)
result <- scPAS(
  bulk_dataset = bulk_data,
  sc_dataset = sc_obj,
  phenotype = phenotype,
  family = "gaussian",
  nfeature = 200,           # Use top 200 variable genes
  permutation_times = 100,  # Reduced for demo (use 1000+ in practice)
  do_imputation = FALSE,    # Skip imputation for speed
  n_cores = 1               # Single core
)
```

## Examine Results

```{r examine-results}
# View added metadata columns
head(result@meta.data[, c("scPAS_RS", "scPAS_NRS", "scPAS_Pvalue", "scPAS_FDR", "scPAS")])

# Summary of cell classifications
table(result$scPAS)

# Check significance
cat("Cells with FDR < 0.05:", sum(result$scPAS_FDR < 0.05, na.rm = TRUE), "\n")
cat("scPAS+ cells:", sum(result$scPAS == "scPAS+", na.rm = TRUE), "\n")
cat("scPAS- cells:", sum(result$scPAS == "scPAS-", na.rm = TRUE), "\n")
```

## Basic Visualization

```{r basic-viz, fig.width=10, fig.height=4}
library(ggplot2)

# UMAP plot colored by cell type
p1 <- DimPlot(result, group.by = "celltype", label = TRUE) +
  ggtitle("Cell Types") +
  theme(legend.position = "bottom")

# UMAP plot colored by risk score
p2 <- FeaturePlot(result, features = "scPAS_NRS") +
  scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  ggtitle("Normalized Risk Score")

# Combine plots
p1 | p2
```

# Output Structure

The scPAS function adds the following columns to the Seurat object's metadata:

| Column | Description |
|--------|-------------|
| `scPAS_RS` | Raw risk score |
| `scPAS_NRS` | Normalized risk score (Z-statistic) |
| `scPAS_Pvalue` | P-value from permutation test |
| `scPAS_FDR` | FDR-adjusted p-value |
| `scPAS` | Classification: "scPAS+", "scPAS-", or "0" |

# Three Phenotype Types

## 1. Continuous Phenotype (Gaussian)

For continuous outcomes like age, BMI, gene expression levels:

```{r gaussian-example, eval=FALSE}
result <- scPAS(
  bulk_dataset = bulk_data,
  sc_dataset = sc_obj,
  phenotype = continuous_values,
  family = "gaussian"
)
```

## 2. Binary Phenotype (Binomial)

For case-control, responder/non-responder comparisons:
  
```{r binomial-example, eval=FALSE}
# Binary phenotype (0/1)
binary_phenotype <- c(0, 1, 0, 1, 1, ...)

result <- scPAS(
  bulk_dataset = bulk_data,
  sc_dataset = sc_obj,
  phenotype = binary_phenotype,
  family = "binomial",
  tag = c("Control", "Case")  # Labels for 0 and 1
)
```

## 3. Survival Phenotype (Cox)

For time-to-event data:

```{r cox-example, eval=FALSE}
# Create survival object
library(survival)
surv_phenotype <- Surv(time = survival_times, event = event_status)

result <- scPAS(
  bulk_dataset = bulk_data,
  sc_dataset = sc_obj,
  phenotype = surv_phenotype,
  family = "cox"
)
```

# Next Steps

- **Algorithm Details**: See `vignette("algorithm")` for methodology
- **Visualization**: See `vignette("visualization")` for advanced plots
- **Case Studies**: See `vignette("case-survival")` for real-world examples
- **Full Tutorial**: See `vignette("scPAS_Tutorial")` for comprehensive guide

# Session Information

```{r session-info}
sessionInfo()
```