作者,Evil Genius
很多人问我这些空转的分析内容都是怎么总结的,其实没啥秘密,天天做项目,看文献,被老师骂,总结出来的。
今天我们需要实现的目标
解析一下这个步骤,大家基础过关之后,想要达到发表文献的分析目标,就是这种极具个性化的分析内容。
第一步:定义每个细胞的邻域区域,图中每个细胞被指定为中心,其邻域由四个同心层定义,最外层是一个边长为72微米的正方形区域。通过量化每个细胞的局部细胞组成,生成了一个单细胞邻域矩阵(中心细胞 × 邻域细胞)。该矩阵的行编码了单个细胞空间邻域内的细胞类型比例。
第二步:对单细胞邻域矩阵应用了非负矩阵分解,识别和量化局部细胞特征。
第三步:使用CellCharter工具中实现的高斯混合模型对细胞进行聚类,识别稳定的空间结构。
我们来实现一下,全流程python脚本,我们逐步来分析。
第一步:构建单细胞邻域矩阵(中心细胞 × 邻域细胞)。
import numpy as np
import pandas as pd
import warnings
from tqdm import tqdm
warnings.filterwarnings("ignore")
def calculate_neighbor_composition(obs, ring, annotation_col='annotations'):
all_categories = obs[annotation_col].unique().tolist()
results = pd.DataFrame(
index=obs.index,
columns=all_categories,
dtype=float
).fillna(0.0)
sample_list = obs["sample"].unique().tolist()
for sample in tqdm(sample_list, desc="Processing samples"):
sample_obs = obs[obs["sample"] == sample]
spatial_index = sample_obs.set_index(['array_row', 'array_col'])
for idx, row in tqdm(sample_obs.iterrows(), total=len(sample_obs), desc=f"Cells in {sample}"):
r, c = row['array_row'], row['array_col']
min_r, max_r = r - ring, r + ring
min_c, max_c = c - ring, c + ring
neighbors = []
for rr in range(min_r, max_r + 1):
for cc in range(min_c, max_c + 1):
if (rr, cc) in spatial_index.index:
neighbors.append(spatial_index.loc[(rr, cc), annotation_col])
if neighbors:
neighbor_series = pd.Series(neighbors)
counts = neighbor_series.value_counts(normalize=True)
for cat, prop in counts.items():
results.at[idx, cat] = prop
return results
entire_obs = sc.read("HD.h5ad")
entire_obs = entire_obs.obs
result_df = calculate_neighbor_composition(
obs=entire_obs,
ring=4,
annotation_col='annotations'
)
result_df.to_csv("./neighboring_celltype_composition.csv")
第二步:非负矩阵分解
import pandas as pd
import warnings
from tqdm import tqdm
from sklearn.decomposition import NMF
from sklearn.neighbors import kneighbors_graph
import igraph as ig
import leidenalg
warnings.filterwarnings("ignore")
celltype_composition = pd.read_csv("./neighboring_celltype_composition.csv",index_col=0)
model = NMF(n_components=5, init='random', random_state=42)
W = model.fit_transform(celltype_composition.values)
H = model.components_
W_df = pd.DataFrame(W, index=celltype_composition.index, columns=[f"Factor_{i+1}" for i in range(W.shape[1])])
H_df = pd.DataFrame(H, columns=celltype_composition.columns, index=[f"Factor_{i+1}" for i in range(H.shape[0])])
W_df.to_csv("./W_values.csv")
H_df.to_csv("./H_values.csv")
第三步:使用CellCharter工具中实现的高斯混合模型对细胞进行聚类,识别稳定的空间结构(niche)。
import numpy as np
import pandas as pd
import scanpy as sc
import scvi
import cellcharter as cc
import warnings
warnings.filterwarnings("ignore")
adata = sc.read("./HD.h5ad")
adata._inplace_subset_var([])
W_value = pd.read_csv("./W_values.csv",index_col=0)
W_value_aligned = W_value.loc[adata.obs_names]
adata.obsm["X_neighboring"] = np.round(W_value_aligned.to_numpy(), 8).astype(np.float32)
autok = cc.tl.ClusterAutoK(
n_clusters=(5,16),
max_runs=5,
model_params=dict(
random_state=12345,
trainer_params=dict(accelerator='gpu', devices=1)
)
)
autok.fit(adata, use_rep='X_neighboring')
adata.obs['cluster'] = autok.predict(adata, use_rep='X_neighboring')
adata.obs['cellular_compartment'] = adata.obs['cluster'].copy()
proportion_table = pd.crosstab(adata.obs['sample'], adata.obs['cellular_compartment'], normalize='index')
proportion_table.to_csv("./compartment_composition_in_samples.csv")
def calculate_roe(mat: pd.DataFrame) -> pd.DataFrame:
mat = mat.copy()
row_sums = mat.sum(axis=1)
col_sums = mat.sum(axis=0)
total = mat.values.sum()
expected = np.outer(row_sums, col_sums) / total
observed = mat.values
roe = np.divide(observed, expected, out=np.zeros_like(observed, dtype=float), where=expected != 0)
roe_df = pd.DataFrame(roe, index=mat.index, columns=mat.columns)
return roe_df
cross_tab = pd.crosstab(adata.obs['cellular_compartment'], adata.obs['annotations'])
roe_df = calculate_roe(cross_tab)
roe_df = roe_df[['NK', 'T', 'B', 'DC', 'Mast', 'Neutrophil', 'Macrophage', 'Endothelial', 'Fibroblast', 'Epithelial', 'Others']]
roe_df.to_csv("./niche_celltype_roe.csv")
生活很好,有你更好。