Health Analytics Network

Home
Research
Publications

Recent Publications

Data Science – 2022

On Rao’s Weighted Distributions for Modeling the Dynamics of Wildfires and Air Pollution. Proceedings of Applied Linear Algebra, Probability and Statistics, Springer, 2022. In press
Food Insecurity in the Eastern Indo-Gangetic Plain: Taking a Closer Look. (PLOS ONE) (in press).
Re-scaling and Small Area Estimation of Behavioral Risk Survey guided by Social Vulnerability Data. (BMC Public Health) (in press).
Systematic mining of patterns of polysubstance use in a nationwide population survey. (Computers in Biology and Medicine).
Geostatistical Modeling and Heterogeneity Analysis of Tumor Molecular Landscape. (Cancers).
Measuring and Mapping Micro Level Earning Inequality for Addressing the Sustainable Development Goals: A Multivariate Small Area Modelling Approach. (Journal of Official Statistics).
Multivariate small area modelling of undernutrition prevalence among under-five children in Bangladesh. (The International Journal of Biostatistics).
Deep variational graph autoencoders for novel host-directed therapy options against COVID-19. (Artificial Intelligence in Medicine).
A deep integrated framework for predicting SARS-CoV2–Human protein-protein interaction. (IEEE Transactions on Emerging Topics in Computational Intelligence).
A regularized multi-task learning approach for cell type detection in single RNA sequencing data. (Frontiers in Genetics).
A copula based topology preserving graph convolution network for clustering of single-cell RNA seq data (PLoS Computational Biology).
LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data. (Nature Communication Biology).
An entropy guided robust feature selection for clustering of single-cell rna-seq data. (Briefings in Bioinformatics).

Data Science – 2021

Health Sciences - 2022

Health Sciences - 2021

Data Science - 2022

Food Insecurity in the Eastern Indo-Gangetic Plain: Taking a Closer Look.

Food security is an important policy issue in India. As India recently ranked 107th out of 121 countries in the 2022 Global Hunger Index, there is an urgent need to dissect, and gain insights into, such a major decline at the national level.However, the existing surveys, due to small sample sizes, cannot be used directly to produce reliable estimates at local administrative levels such as districts.

Systematic mining of patterns of polysubstance use in a nationwide population survey

We developed a new computational platform for PolySubstance Use data Mining for Associations and Transitions (PSUMAnT). It is based on the computation of weighted support, a measure of popularity, for the use of every combination of one or more substances, termed as a drugset, over a period of 5 decades (1965–2014) based on NSDUH data.

Geostatistical Modeling and Heterogeneity Analysis of Tumor Molecular Landscape

The present study introduces a new computational platform referred to as GATHER to conduct Geostatistical Analysis of Tumor Heterogeneity and Entropy in R. GATHER has several distinct advantages such as (a) a novel use of single-cell-specific spatial information for kriging to synthesize high-resolution and continuous gene expression landscapes of a given tumor sample.

Measuring and Mapping Micro Level Earning Inequality towards Addressing the Sustainable Development Goals – A Multivariate Small Area Modelling Approach

The earning inequality in India has unfavorably obstructed underprivileged in accessing elementary needs like health and education. Periodic labour force survey conducted by National Statistical Office of India generates estimates on earning status at national and state level for both rural and urban sectors separately.

Multivariate small area modelling of undernutrition prevalence among under-five children in Bangladesh

District-representative data are rarely collected in the surveys for identifying localised disparities in Bangladesh, and so district-level estimates of undernutrition indicators - stunting, wasting and underweight - have remained largely unexplored. This study aims to estimate district-level prevalence of these indicators by employing a multivariate Fay-Herriot (MFH) model which accounts for the underlying correlation among the undernutrition indicators.

Deep variational graph autoencoders for novel host-directed therapy options against COVID-19

Here we focus on predicting unknown links between drugs and human proteins that play key roles in the replication cycle of SARS-CoV-2. The work utilized variational graph autoencoders (VGAEs) based deep learning model to learn the structure of a large molecular interaction network constructed from year-long curated drug–protein/protein–protein interaction data, and most recent SARS-CoV-2 protein interaction data.

A deep integrated framework for predicting SARS-CoV2–Human protein-protein interaction

Unavailability of a proper set of interactions between SARS-CoV2 and human host proteins limits the set of possible drug-targets. This work presents a deep learning based methodology for high confidence interaction prediction between SARS-CoV2 and human host proteins. It leverages the landmark advantage of Node2Vec to produce a low dimensional embedding from a compiled interaction network that puts SARS-CoV2 proteins.

A regularized multi-task learning approach for cell type detection in single RNA sequencing data

Cell type prediction is one of the most challenging goals in single-cell RNA sequencing (scRNA-seq) data. This work presents a framework based on regularized multi-task learning (RMTL) that enables us to simultaneously learn the subpopulation associated with a particular cell type. Learning the structure of subpopulations is treated as a separate task in the multi-task learner. Regularization is used to modulate the multi-task model jointly, according to the specific prior.

A topology preserving graph convolution network for clustering of single-cell RNA seq data, PLoS Computational Biology

One of the important aspects of single cell downstream analysis is to classify cells into subpopulations. This faces lots of issues due to (i) small amount of starting RNA, (ii) cell-to-cell variability, (iii) technical noise incorporated within the single cell sequencing technology, and (iv) unavailability of discriminating selected/extracted genes (features) in the preprocessing.

LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data

A fundamental problem of downstream analysis of scRNA-seq data is the unavailability of enough cell samples compared to the feature size. This work presents an improved version of generative adversarial network (GAN) called LSH-GAN to address this issue by producing new realistic cell samples.

sc-REnF: An entropy guided robust feature selection for clustering of single-cell rna-seq data

This introduces sc-REnF [robust entropy-based feature (gene) selection method], aiming to leverage the advantages of R′enyi and Tsallis entropies in gene selection for single cell clustering. Since single-cell data are susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis.

Circular functional analysis of OCT data for precise identification

Progressive optic neuropathies such as glaucoma are major causes of blindness globally. Multiple sources of subjectivity and analytical challenges are often encountered by clinicians in the process of early diagnosis and clinical management of these diseases.

Data fusion using factor analysis and low-rank matrix

Data fusion involves the integration of multiple related datasets. The statistical file-matching problem is a canonical data fusion problem in multivariate analysis, where the objective is to characterise the joint distribution of a set of variables when only strict subsets of marginal distributions have been observed.

Estimation of Tail Probabilities

Synthetic data, when properly used, can enhance patterns in real data and thus provide insights into different problems. Here, the estimation of tail probabilities of rare events from a moderately large number of observations is considered.

AICov: Framework for COVID-19

The COVID-19 (COrona VIrus Disease 2019) pandemic has had profound global consequences on health, economic, social, behavioral, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19.

Determination of critical community size from an HIV/AIDS model

After an epidemic outbreak, the infection persists in a community long enough to engulf the entire susceptible population. Local extinction of the disease could be possible if the susceptible population gets depleted.

Protocol for a cluster randomised trial evaluating

The Healthy Life Trajectories Initiative is an international consortium comprising four harmonised but independently powered trials to evaluate whether an integrated intervention starting preconceptionally will reduce non-communicable disease risk in their children.

Multivariate Tail Probabilities

In disease modeling, a key statistical problem is the estimation of lower and upper tail probabilities of health events from given data sets of small size and limited range. Assuming such constraints, we describe a computational framework for the systematic fusion of observations from multiple sources to compute tail probabilities.

Clustering Patterns Connecting COVID-19 Dynamics

Social distancing and stay-at-home are among the few measures that are known to be effective in checking the spread of a pandemic such as COVID-19 in a given population. The patterns of dependency between such measures and their effects on disease incidence may vary dynamically and across different populations.

Use of Linear Combination Test to Identify Gene Signatures

Data on human systems biology are being generated at a rapid pace due to technological advances in not only high-throughput, but also high-resolution, platforms. Increasing availability of single cell omic data have motivated complex experiments with the intention to gain deeper insights into complex biological systems.

Differential Patterns of Social Media Use Associated with Loneliness

Loneliness has emerged as a chronic and persistent problem for a considerable fraction of the general population in the developed world. Concurrently, use of online social media by the same societies has steadily increased over the past two decades.

Mapping micro-level inequality for addressing Sustainable Development Goals

This paper describes an efficient small area estimation approach for obtaining the distribution of average Monthly income and the degree of income inequality between the rural and urban sector of the state of Bihar in India. A bivariate map is also produced to assist the policymakers in identifying the regions require further attention. Journal of Official Statistics. (In Press).

Measuring Micro Level Earning Inequality

Efficient model-based small area estimates of inequality in income distribution are obtained at district level for rural and urban sector of the state of Uttar Pradesh in India by combining latest round of Periodic Labour Force Survey 2018-19 data of NSO, Govt. of India and the Population Census of India 2011.

Disaggregating food insecurity

This article delineates multivariate small area estimation method to obtain reliable and representative model-based estimates of food insecurity indicators (Prevalence, Gap and Severity) at district level for the rural areas of Uttar Pradesh in India. The disaggregate level estimates along with spatial maps of food insecurity indicators are directly relevant to sustainable development goal indicator 2.1.2 - severity of food insecurity.

Disparities in food consumption and nutritional status

Joint modelling and spatial mapping of food consumption and nutrition intake status at district level for the rural areas of Uttar Pradesh in India by combining latest round of Household Consumer Expenditure Survey 2011–2012 data of NSSO, Govt. of India and the Indian Population Census 2011.

Food and nutrition in the Indo-Gangetic Plain

This article demonstrates nourishment utilization design across selected social and economic groups in the states falling under Indo-Gangetic Plain region of India, which include West Bengal, Bihar, Uttar Pradesh, Punjab and Haryana. The present study also indicates that a large proportion of our population is malnourished and anaemic. There is an urgent need to build some accord on the standards for minimum necessary calorie intake as these factors are important with respect to the seriousness of craving and under-nourishment in our nation.

Improved estimation of finite population mean

Improved chain-ratio estimators for the population mean based on two-phase sampling are proposed when the study variable and two auxiliary variables comprise non-response. This estimator is a weighted combination of the ratio estimator and Chand’s chain-ratio estimator and the weights are obtained by minimizing the mean square error of this estimator.

Improved chain ratio type estimator for population total

An improved version of chain-ratio type estimator for the population total based on double sampling is proposed when auxiliary information is available for the first variable and not available for the second variable. The estimators show substantial gain in efficiency compared to other estimators in both model-based and design-based simulations.

Calibration approaches in two-phase samples

Proposed a chain ratio type and a chain product type estimator of population total in two-phase sampling when information on two auxiliary characters is available in different phases. Two cases are considered: (i) both the auxiliary variables are positively correlated with the study variable and (ii) both the auxiliary variables are negatively correlated with the study variable.

LSH-GAN: in-silico generation of cells for small sample high dimensional scRNA-seq data

A fundamental problem of downstream analysis of scRNA-seq data is the unavailability of enough cell samples compared to the feature size. This is mostly due to the budgetary constraint of single cell experiments or simply because of the small number of available patient samples. Here, an improved version of generative adversarial network (GAN) called LSH-GAN is proposed to address this issue. Experimental results show that generated samples of LSH-GAN improve the performance of the downstream analysis of single cell RNA-seq data. Accepted in Nat. Communication Biology

sc-REnF: An entropy guided robust feature selection for clustering of single-cell rna-seq data

Single cell data is susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. We introduce sc-REnF, (robust entropy based feature (gene) selection method), aiming to leverage the advantages of Renyi ´ and T sallis entropies in gene selection for single cell clustering. sc-REnF yields good clustering performance in small sample, large feature scRNA-seq data. Accepted in Briefings in Bioinformatics

RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data

Single cell data is susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Here, we propose a novel regularized copula based method for gene selection that leverage copula correlation (Ccor) measure for capturing cell-to-cell variability within the data. The proposed objective function uses an l1 regularization term to penalizes the redundant co-efficient of features/genes.

Pan-cancer classification by regularized multi-task learning

Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. We have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms.

Identification of key immune regulatory genes in HIV-1 progression

Human immunodeficiency virus (HIV) infection causes acquired immunodeficiency syndrome (AIDS), one of the most devastating diseases affecting humankind. Here, we have proposed a framework to examine the differences among microarray gene expression data of uninfected and three different HIV-1 infection stages using module preservation statistics. We leverage the advantage of gene co-expression networks (GCN) constructed for each infection stages to detect the topological and structural changes of a group of differentially expressed genes.

A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data

Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering.

sc-REnF: An entropy guided robust feature selection for single-cell RNA-seq data

Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. Since single-cell data are susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Therefore, interest in robust gene selection has gained considerable attention in recent years. We introduce sc-REnF [robust entropy based feature (gene) selection method], aiming to leverage the advantages of R′enyi and Tsallis entropies in gene selection for single cell clustering.

Comparing Environmental Policies to Reduce Pharmaceutical Pollution and Address Disparities.

Pharmaceutical products, including active pharmaceutical ingredients and inactive ingredients such as packaging materials, have raised significant concerns due to their persistent input and potential threats to human and environmental health.

Evaluating Students’ COVID-19 Knowledge, Attitudes, and Practices (COVKAP) during the COVID-19 pandemic.

The COVID-19 pandemic led to significant disruption in students' lives through lockdowns, restricted movement, remote instruction, and mixed information. Therefore, this study aimed to capture the knowledge, attitudes, and practices of student pharmacists during 2020-2021.

Quō vādis?

Short-term medical missions (STMMs) have evolved in the past few decades to provide non-emergent care including routine and follow-up primary care for acute and chronic conditions, along with treatment of neglected tropical diseases.Many STMMs operate outside the local health care infrastructure and may have limited local partnerships.

Student Pharmacists during the Pandemic

The COVID-19 pandemic has caused innumerable changes to all aspects of human life and behavior, including academic life. This study describes the development of a COVID-19 Knowledge, Attitudes, and Practices (COVKAP) Survey among U.S. student pharmacists.

Pharmacy Emergency Preparedness and Response (PEPR)

Pharmacists have long been involved in public health and emergency preparedness and response (EP&R), including through preventive measures such as screening, vaccinations, testing and medical.

Emergency preparedness and response (EP&R)

The COVID-19 pandemic highlights the importance of Emergency Preparedness & Response (EP&R) education, training, capacity building and infrastructure development in India. During the pandemic, pharmacy professionals (PPs) in India have continued to provide medications, supplies and services.

Empowering Public Health Pharmacy Practice

This article describes the history and evolution of pharmacist-physician collaborative practice agreements (CPAs) in the United States with future directions to support pharmacists' provider status as the profession continues to evolve from product-oriented to patient-centered care and population health.

Recent Publications

Data Science – 2022

Data Science – 2021

Health Sciences - 2022

Health Sciences - 2021

Data Science - 2022

Food Insecurity in the Eastern Indo-Gangetic Plain: Taking a Closer Look.

Systematic mining of patterns of polysubstance use in a nationwide population survey

Geostatistical Modeling and Heterogeneity Analysis of Tumor Molecular Landscape

Measuring and Mapping Micro Level Earning Inequality towards Addressing the Sustainable Development Goals – A Multivariate Small Area Modelling Approach

Multivariate small area modelling of undernutrition prevalence among under-five children in Bangladesh

Deep variational graph autoencoders for novel host-directed therapy options against COVID-19

A deep integrated framework for predicting SARS-CoV2–Human protein-protein interaction

A regularized multi-task learning approach for cell type detection in single RNA sequencing data

A topology preserving graph convolution network for clustering of single-cell RNA seq data, PLoS Computational Biology

LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data

sc-REnF: An entropy guided robust feature selection for clustering of single-cell rna-seq data

Data Science - 2021

Circular functional analysis of OCT data for precise identification

Data fusion using factor analysis and low-rank matrix

Estimation of Tail Probabilities

AICov: Framework for COVID-19

Determination of critical community size from an HIV/AIDS model

Protocol for a cluster randomised trial evaluating

Multivariate Tail Probabilities

Clustering Patterns Connecting COVID-19 Dynamics

Use of Linear Combination Test to Identify Gene Signatures

Differential Patterns of Social Media Use Associated with Loneliness

Mapping micro-level inequality for addressing Sustainable Development Goals

Measuring Micro Level Earning Inequality

Disaggregating food insecurity

Disparities in food consumption and nutritional status

Food and nutrition in the Indo-Gangetic Plain

Improved estimation of finite population mean

Improved chain ratio type estimator for population total

Calibration approaches in two-phase samples

LSH-GAN: in-silico generation of cells for small sample high dimensional scRNA-seq data

sc-REnF: An entropy guided robust feature selection for clustering of single-cell rna-seq data

RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data

Pan-cancer classification by regularized multi-task learning

Identification of key immune regulatory genes in HIV-1 progression

A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data

sc-REnF: An entropy guided robust feature selection for single-cell RNA-seq data

Health Sciences - 2022

Comparing Environmental Policies to Reduce Pharmaceutical Pollution and Address Disparities.

Evaluating Students’ COVID-19 Knowledge, Attitudes, and Practices (COVKAP) during the COVID-19 pandemic.

Health Sciences - 2021

Quō vādis?

Student Pharmacists during the Pandemic

Pharmacy Emergency Preparedness and Response (PEPR)

Emergency preparedness and response (EP&R)

Empowering Public Health Pharmacy Practice