Jahanzeb's Homepage


profile photo



Jahanzeb Maqbool Hashmi

Resume  /  CV  /  Google Scholar

I am a Senior Architect (HPC) at NVIDIA where I am working on designing next-generation GPUs, CPUs, and high-speed interconnects as part of the architecture group

Previously, I worked as a Senior Research Associate at The Ohio State University (OSU) where I worked at Network Based Computing Laboratory (NBCL). I mainly worked on high-performance MPI library MVAPICH2 with focus on optimizing MPI runtime for emerging HPC and cloud systems. I primarily worked on the design and development of GPU-aware MPI library for RDMA networks with AMD/NVIDIA hardware, hierarchical MPI collectives for scalable scientific and deep learning applications, efficient data-movement for sparse data layouts, high-level abstractions for asynchronous communication offloading, and performance engineering of parallel applications on multi-petaflop supercomputers and cloud systems.

Service
  • Jun 2023 - I will be serving as a member of the Technical Program Committee at HOT Interconnects 2023. Please consider submitting your work at https://hoti.org/
  • Jun 2022 - I will be serving as a member of the Technical Program Committee at HOT Interconnects 2022
  • Sep 2021 - I will be serving as a member of the Technical Program Committee at IPDPS 2022 (Architecture Track).
  • Aug 2021 - I will be serving as Technical Reviewer for IEEE MICRO journal
  • Jun 2021 - I will be serving as a member of Technical Program Committee at HOT Interconnects 2021
Education

I completed my Ph.D. in Computer Science and Engineering from OSU where I was advised by Prof. D. K. Panda. Prior to joining OSU, I did my Masters from Ajou University, South Korea where I worked on energy-efficient High Performance Computing (HPC) using low-powered ARM SoC based cluster applications. I did my B.S from National University of Science and Technology (NUST), Pakistan where I worked on performance characterization and parallelization of numerical simulation codes e.g., particle simulation with SPH methods on multi-core using MPI, OpenMP, and CUDA.

Recent Updates
  • Dec 2021 - Our paper got selected as Best Paper Finalist at HiPC'21
  • Sep 2021 - Co-authored work on designing architecture-aware communication trees has been accepted to HiPC'21
  • Mar 2021 - Co-authored work on designing ROCm-aware MPI library has been accepted to ISC'21
  • Mar 2021 - Co-authored work "BluesMPI" has been accepted to ISC'21
  • Oct 2020 - Co-authored work "Blink" has been accepted to HiPC'20
  • Jun 2020 - Co-authored work "GEMS" has been accepted to SC'20
  • Jun 2020 - I started working full-time as a Senior Research Associate Engineer at NBCL
  • May 2020 - I have successfully defended my Ph.D. thesis (Slides)
  • May 2020 - "FALCON-X" is accepted to JPDC special issue
  • Jan 2020 - Our work on efficient MPI topologies for HPC Clouds got accepted to IPDPS'20
  • May 2019 - I have passed my Ph.D. candidacy exam (Slides)
  • Mar 2019 - "FALCON" got nominated for the Best Paper Finalist at IPDPS'19
  • Mar 2019 - Our paper on characterizing shared-address-space MPI collectives got accepted to CCGrid'19
  • Dec 2018 - Our paper "FALCON" on efficient processing of MPI derived datatypes got accepted to IPDPS'19
  • Sep 2018 - I presented co-authored paper at Cluster'18 that won Best Paper Award
  • Jul 2018 - Co-authored paper on co-operative rendezvous protocols got nominated for Best Paper Finalist at SC'18
  • May 2018 - I presented our shared-address-space MPI runtime work at IPDPS'18 at Vancouver, Canada (Slides)
  • Apr 2018 - I gave a talk at IXPUG held at KAUST, Saudi Arabia (Group Photo) (Slides)
  • Dec 2018 - Our paper on sparse non-contigous MPI datatypes (FALCON) is accepted to IPDPS'19
  • Jan 2018 - Our work on designing shared-address-space MPI runtime is accepted to IPDPS'18
Select Publications

For complete list of publications, please refer to my Google Scholar page.

Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems
B. Ramesh, J. Hashmi, S. Xu, A. Shafi, M. Ghazimirsaeed, M. Bayatpour, H. Subramoni, and DK Panda.
28th IEEE International Conference on High Performance Computing, Data, Analytics and Data Science (HiPC), 2021.
Best Paper Finalist


GEMS: GPU Enabled Memory Aware Model Parallelism System for Distributed DNN Training
A. Jain, A. Awan, A. Aljuhani, J. Hashmi, Q. Anthony, H. Subramoni, D. Panda, R. Machiraju, A. Parwani
IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2020.

FALCON-X: Zero-copy MPI Derived Datatype Processing on Modern CPU and GPU Architectures
J. Hashmi, C. Chu, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda.
Journal of Parallel and Distributed Computing (JPDC), Volume 144, October 2020, Pages 1-13, doi.org/10.1016/j.jpdc.2020.05.008, 2020.

Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures
J. Hashmi, S. Xu, B. Ramesh, M. Bayatpour, H. Subramoni, and D. K. Panda.
34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020.
Slides

FALCON: Efficient Designs for Zero-copy MPI Datatype Processing on Emerging Architectures
J. Hashmi, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda.
33th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019.
Best Paper Finalist
Slides

Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores
J. Hashmi, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda.
32th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018.
Slides

Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures
J. Hashmi, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda.
19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid), 2019.
Slides

Cooperative Rendezvous Protocols for Improved Performance and Overlap
S. Chakraborty, M. Bayatpour, J. Hashmi, H. Subramoni, and DK Panda.
IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2018.
Best Paper Finalist

SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives
M. Bayatpour, S. Chakraborty, J. Hashmi, H. Subramoni, and DK Panda.
IEEE International Conference on Cluster Computing (CLUSTER), 2018.
Best Paper Award

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
M. Bayatpour, S. Chakraborty, J. Hashmi, H. Subramoni, and DK Panda.
22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programmin (PPoPP), 2017.

Kernel-assisted Communication Engine for MPI on Emerging Manycore Processors
J. Hashmi, K. Hamidouche, H. Subramoni, and DK Panda.
24th IEEE International Conference on High Performance Computing, Data, Analytics and Data Science (HiPC), 2017.
Slides

Exploiting and Evaluating OpenSHMEM on KNL Architecture
J. Hashmi, M. Li, H. Subramoni, and DK Panda.
4th Workshop on OpenSHMEM and Related Technologies (OpenSHMEM), 2017.
Slides

Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models
J. Hashmi, K. Hamidouche, and DK Panda.
18th IEEE International Conference on High Performance Computing and Communications (HPCC), 2016.
Slides