|
|
Venues (Conferences, Journals, ...)
|
|
GrowBag graphs for keyword ? (Num. hits/coverage)
Group by:
The graphs summarize 96 occurrences of 36 keywords
|
|
|
Results
Found 134 publication records. Showing 133 according to the selection in the facets
Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
145 | Bo Kågström, Per Ling, Charles Van Loan |
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark. |
ACM Trans. Math. Softw. |
1998 |
DBLP DOI BibTeX RDF |
GEMM-based level 3 BLAS, matrix-matrix kernels, parallelization, memory hierarchy, vectorization, FORTRAN 77, blocked algorithms |
101 | Yinan Li 0002, Jack J. Dongarra, Stanimire Tomov |
A Note on Auto-tuning GEMM for GPUs. |
ICCS (1) |
2009 |
DBLP DOI BibTeX RDF |
matrix multiply, GPUs, Auto-tuning, dense linear algebra |
83 | Isak Jonsson, Bo Kågström |
Parallel Triangular Sylvester-Type Matrix Equation Solvers for SMP Systems Using Recursive Blocking. |
PARA |
2000 |
DBLP DOI BibTeX RDF |
Sylvester-type matrix equations, recursion, superscalar, level 3 BLAS, GEMM-based, automatic blocking |
78 | Michel J. Daydé, Iain S. Duff, Antoine Petitet |
A parallel block implementation of Level-3 BLAS for MIMD vector processors. |
ACM Trans. Math. Softw. |
1994 |
DBLP DOI BibTeX RDF |
matrix-matrix kernels, parallelization, vectorization, Level-3 BLAS |
66 | Bo Kågström, Charles Van Loan |
Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues. |
ACM Trans. Math. Softw. |
1998 |
DBLP DOI BibTeX RDF |
GEMM-based level 3 BLAS, matrix-matrix kernels, parallelization, memory hierarchy, vectorization, FORTRAN 77, blocked algorithms |
63 | Michael J. Feeley, Norman C. Hutchinson, Suprio Ray |
Realistic Mobility for Mobile Ad Hoc Network Simulation. |
ADHOC-NOW |
2004 |
DBLP DOI BibTeX RDF |
GEMM, MANET, Mobility Model |
59 | Ahmed Sherif Zekri, Stanislav G. Sedukhin |
The general matrix multiply-add operation on 2D torus. |
IPDPS |
2006 |
DBLP DOI BibTeX RDF |
|
51 | John S. McCaskill, Thomas Maeke, Udo Gemm, Ludger Schulte, Uwe Tangen |
NGEN: A Massively Parallel Reconfigurable Computer for Biological Simulation: Towards a Self-Organizing Computer. |
ICES |
1996 |
DBLP DOI BibTeX RDF |
|
45 | Shixun Wu, Yujia Zhai, Jiajun Huang, Zizhe Jian, Zizhong Chen |
FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
45 | Shixun Wu, Yujia Zhai, Jiajun Huang, Zizhe Jian, Zizhong Chen |
FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs. |
HPDC |
2023 |
DBLP DOI BibTeX RDF |
|
44 | Robert Granat, Bo Kågström |
Evaluating Parallel Algorithms for Solving Sylvester-Type Matrix Equations: Direct Transformation-Based Versus Iterative Matrix-Sign-Function-Based Methods. |
PARA |
2004 |
DBLP DOI BibTeX RDF |
Sylvester matrix equation, Bartels–Stewart method, explicit blocking, c-stable matrices, PSLICOT, level 3 BLAS, continuous-time, GEMM-based, ScaLAPACK, Newton iteration, matrix sign function |
44 | Bo Kågström |
Management of Deep Memory Hierarchies - Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Computations. |
PARA |
2004 |
DBLP DOI BibTeX RDF |
automatic variable blocking, hybrid data structures, superscalar kernels, SMP parallelization, library software, ESSL, RECSY, periodic systems, factorizations, recursion, superscalar, LAPACK, level 3 BLAS, dense linear algebra, GEMM-based, SLICOT, matrix equations |
44 | Robert Granat, Isak Jonsson, Bo Kågström |
Combining Explicit, Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations on Distributed Memory Platforms. |
Euro-Par |
2004 |
DBLP DOI BibTeX RDF |
Sylvester matrix equation, Bartels–Stewart method, ScaLAPACK-style algorithms, RECSY, blocking, LAPACK, recursive algorithms, level 3 BLAS, continuous-time, GEMM-based, automatic blocking |
44 | Isak Jonsson, Bo Kågström |
RECSY - A High Performance Library for Sylvester-Type Matrix Equations. |
Euro-Par |
2003 |
DBLP DOI BibTeX RDF |
Sylvester-type matrix equations, RECSY, recursion, superscalar, LAPACK, level 3 BLAS, GEMM-based, SLICOT, automatic blocking |
44 | Robert Granat, Bo Kågström, Peter Poromaa |
Parallel ScaLAPACK-Style Algorithms for Solving Continuous-Time Sylvester Matrix Equations. |
Euro-Par |
2003 |
DBLP DOI BibTeX RDF |
Sylvester matrix equation, Bartels-Stewart method, ScaLAPACK-style algorithms, blocking, level 3 BLAS, continuous-time, GEMM-based, SLICOT |
44 | Isak Jonsson, Bo Kågström |
Recursive blocked algorithms for solving triangular systems - Part I: one-sided and coupled Sylvester-type matrix equations. |
ACM Trans. Math. Softw. |
2002 |
DBLP DOI BibTeX RDF |
SMP parallelization, generalized coupled Sylvester, standard Sylvester and Lyapunov, recursion, superscalar, LAPACK, level-3 BLAS, GEMM-based, SLICOT, Matrix equations, automatic blocking |
44 | Isak Jonsson, Bo Kågström |
Recursive blocked algorithms for solving triangular systems - Part II: two-sided and generalized Sylvester and Lyapunov matrix equations. |
ACM Trans. Math. Softw. |
2002 |
DBLP DOI BibTeX RDF |
SMP parallelization, generalized Sylvester and Lyapunov, standard discrete-time Sylvester and Lyapunov, recursion, superscalar, LAPACK, level-3 BLAS, GEMM-based, SLICOT, Matrix equations, automatic blocking |
39 | Vasily Volkov, James Demmel |
Benchmarking GPUs to tune dense linear algebra. |
SC |
2008 |
DBLP DOI BibTeX RDF |
|
39 | Bjarne Stig Andersen, Jerzy Wasniewski, Fred G. Gustavson |
A recursive formulation of Cholesky factorization of a matrix in packed storage. |
ACM Trans. Math. Softw. |
2001 |
DBLP DOI BibTeX RDF |
Cholesky factorization and solution, complex Hermitian matrices, novel packed matrix data structures, real symmetric matrices, BLAS, recursive algorithms, positive definite matrices |
39 | Fred G. Gustavson, Isak Jonsson |
High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage. |
PARA |
2000 |
DBLP DOI BibTeX RDF |
packed format, level 3 BLAS parallelism, recursive algorithm, Cholesky factorization, recursive data structure |
39 | Michel J. Daydé, Iain S. Duff |
The RISC BLAS: a blocked implementation of level 3 BLAS for RISC processors. |
ACM Trans. Math. Softw. |
1999 |
DBLP DOI BibTeX RDF |
matrix-matrix kernels, blocking, loop-unrolling, level 3 BLAS, RISC processors |
24 | Samuel Williams 0001, John Shalf, Leonid Oliker, Shoaib Kamil 0001, Parry Husbands, Katherine A. Yelick |
Scientific Computing Kernels on the Cell Processor. |
Int. J. Parallel Program. |
2007 |
DBLP DOI BibTeX RDF |
GEMM, SpMV, three level memory, FFT, sparse matrix, Cell processor, Stencil |
24 | Samuel Williams 0001, John Shalf, Leonid Oliker, Shoaib Kamil 0001, Parry Husbands, Katherine A. Yelick |
The potential of the cell processor for scientific computing. |
Conf. Computing Frontiers |
2006 |
DBLP DOI BibTeX RDF |
GEMM, SpMV, three level memory, FFT, sparse matrix, cell processor, stencil |
23 | Susana Ortega-Cisneros |
Design and Implementation of an NoC-Based Convolution Architecture With GEMM and Systolic Arrays. |
IEEE Embed. Syst. Lett. |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Cong Guo 0003, Fengchen Xue, Jingwen Leng, Yuxian Qiu, Yue Guan, Weihao Cui, Quan Chen 0002, Minyi Guo |
Accelerating Sparse DNNs Based on Tiled GEMM. |
IEEE Trans. Computers |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, Tiejun Li |
SparGD: A Sparse GEMM Accelerator with Dynamic Dataflow. |
ACM Trans. Design Autom. Electr. Syst. |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Venkata Sai Praneeth Karempudi, Sairam Sri Vatsavai, Ishan G. Thakkar, Oluwaseun Adewunmi Alo, Jeffrey Todd Hastings, Justin Scott Woods |
A Low-Dissipation and Scalable GEMM Accelerator with Silicon Nitride Photonics. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Oluwaseun Adewunmi Alo, Ishan G. Thakkar |
A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Cong Guo 0003, Fengchen Xue, Jingwen Leng, Yuxian Qiu, Yue Guan, Weihao Cui, Quan Chen 0002, Minyi Guo |
Accelerating Sparse DNNs Based on Tiled GEMM. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Jaeyong Jang, Yulhwa Kim, Juheun Lee, Jae-Joon Kim |
FIGNA: Integer Unit-Based Accelerator Design for FP-INT GEMM Preserving Numerical Accuracy. |
HPCA |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Seonghun Jeong, Jooyeon Lee, Jaeha Kung |
A Full SW-HW Demonstration of GEMM Accelerators with RISC-V Instruction Extensions. |
ICEIC |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Lili Xu, Binjie Chen, Chenhao Huang, Mengmeng Zhou, Shucheng You, Fangming Jiang, Weirong Chen, Jinsong Deng |
Identifying PM2.5-Related Health Burden in the Context of the Integrated Development of Urban Agglomeration Using Remote Sensing and GEMM Model. |
Remote. Sens. |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Sandeep Kumar Sharma, Amit Chaurasia, Vijay Shankar Sharma, Chiranji Lal Chowdhary, Shakila Basheer |
GEMM, a Genetic Engineering-Based Mutual Model for Resource Allocation of Grid Computing. |
IEEE Access |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Jordi Fornt, Pau Fontova-Musté, Martí Caro, Jaume Abella 0001, Francesc Moll, Josep Altet, Christoph Studer |
An Energy-Efficient GeMM-Based Convolution Accelerator With On-the-Fly im2col. |
IEEE Trans. Very Large Scale Integr. Syst. |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Iryna De Albuquerque Silva, Thomas Carle, Adrien Gauffriau, Claire Pagetti |
Extending a predictable machine learning framework with efficient gemm-based convolution routines. |
Real Time Syst. |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Hyeonjin Kim, William J. Song |
LAS: Locality-Aware Scheduling for GEMM-Accelerated Convolutions in GPUs. |
IEEE Trans. Parallel Distributed Syst. |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Louis Ledoux, Marc Casas |
Open-Source GEMM Hardware Kernels Generator: Toward Numerically-Tailored Computations. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen |
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin 0001, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna |
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Saeed Maleki |
Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan 0002, Siva Kumar Sastry Hari, Timothy Tsai 0002, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant J. Nair, Kevin J. Barker, Ang Li 0006 |
MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Ruqing G. Xu, Field G. Van Zee, Robert A. van de Geijn |
GEMMFIP: Unifying GEMM in BLIS. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré |
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Devangi N. Parikh, Robert A. van de Geijn, Greg M. Henry |
Cascading GEMM: High Precision from Low Precision. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Enrico Reggiani, Alessandro Pappalardo, Max Doblas, Miquel Moretó, Mauro Olivieri, Osman Sabri Unsal, Adrián Cristal |
Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices. |
HPCA |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Ranggi Hwang, Minhoo Kang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu |
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks. |
HPCA |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin 0001, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna |
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs. |
HPCA |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Susmita Dey Manasi, Suvadeep Banerjee, Abhijit Davare, Anton A. Sorokin, Steven M. Burns, Desmond A. Kirkpatrick, Sachin S. Sapatnekar |
Reusing GEMM Hardware for Efficient Execution of Depthwise Separable Convolution on ASIC-Based DNN Accelerators. |
ASP-DAC |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Jie Lei, Héctor Martínez, José Flich, Enrique S. Quintana-Ortí |
GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal. |
ISC Workshops |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Guosheng Yu, Zhihong Lv, Haijiang Wang, Zilong Huang, Jicheng Chen |
Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud. |
AICAS |
2023 |
DBLP DOI BibTeX RDF |
|
23 | RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn |
Towards a Unified Implementation of GEMM in BLIS. |
ICS |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen |
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs. |
ICS |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Alexey Romanov, Andrei Turkin, Oleg Myakinin, Fiodar Tsupko, Jiexing Gao |
Parameter Estimation via Time Modeling for MLIR Implementation of GEMM. |
OPTIMA |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Yongseung Yu, Donghyun Son, Younghyun Lee, Sunghyun Park 0004, Giha Ryu, Myeongjin Cho, Jiwon Seo 0002, Yongjun Park 0001 |
Tailoring CUTLASS GEMM using Supervised Learning. |
ICCD |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Harideep Nair, Prabhu Vellaisamy, Albert Chen, Joseph Finn, Anna Li, Manav Trivedi, John Paul Shen |
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI. |
ISCAS |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Evan Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré |
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture. |
NeurIPS |
2023 |
DBLP BibTeX RDF |
|
23 | Tahsin Tariq Banna, Swakshar Deb, Sejuti Rahman, Shafin Rahman |
GEMM: A Graph Embedded Model for Memorability Prediction. |
IJCNN |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Zhiwei Yang, Lu Lu, Ruimin Wang |
A batched GEMM optimization framework for deep learning. |
J. Supercomput. |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Thomas Faingnaert, Tim Besard, Bjorn De Sutter |
Flexible Performant GEMM Kernels on GPUs. |
IEEE Trans. Parallel Distributed Syst. |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Sergio Barrachina 0001, Manuel F. Dolz, Pablo San Juan, Enrique S. Quintana-Ortí |
Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors. |
J. Parallel Distributed Comput. |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Nihat Mert Cicek, Xipeng Shen, Ozcan Ozturk 0001 |
Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse. |
ACM Trans. Design Autom. Electr. Syst. |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Yunan Zhang, Po-An Tsai, Hung-Wei Tseng 0001 |
SIMD2: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM. |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Jianyu Yao, Boqian Shi, Chunyang Xiang, Haipeng Jia, Chendi Li, Hang Cao, Yunquan Zhang |
IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM. |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Minhoo Kang, Ranggi Hwang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu |
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks. |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Mark Gates, Asim YarKhan, Dalal Sukkari, Kadir Akbudak, Sébastien Cayrols, Daniel Bielich, Mohammed A. Al Farhan, Jack J. Dongarra |
Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher. |
|
2022 |
DOI RDF |
|
23 | Mark Gates, Asim YarKhan, Dalal Sukkari, Kadir Akbudak, Sébastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Mohammed A. Al Farhan, Jack J. Dongarra |
Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher. |
|
2022 |
DOI RDF |
|
23 | Bo Wang, Sheng Ma, Zhong Liu, Libo Huang, Yuan Yuan 0034, Yi Dai |
SADD: A Novel Systolic Array Accelerator with Dynamic Dataflow for Sparse GEMM in Deep Learning. |
NPC |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Cunyang Wei, Haipeng Jia, Yunquan Zhang, Kun Li, Luhan Wang |
LBBGEMM: A Load-balanced Batch GEMM Framework on ARM CPU s. |
HPCC/DSS/SmartCity/DependSys |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Arthur Francisco Lorenzon, Sandro Matheus V. N. Marques, Antoni C. Navarro, Vicenç Beltran 0001 |
Seamless optimization of the GEMM kernel for task-based programming models. |
ICS |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Chunhua Xiao, Chen Shi, Dandan Xu, Fangzhu Lin, Kun Ning |
SDST-Accelerating GEMM-based Convolution through Smart Data Stream Transformation. |
CCGRID |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Bo Wang, Sheng Ma, Yuan Yuan 0034, Yi Dai, Wei Jiang, Xiang Hou, Xiao Yi, Rui Xu |
SparG: A Sparse GEMM Accelerator for Deep Learning Applications. |
ICA3PP |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Dennis Agyemanh Nana Gookyi, Eunchong Lee, Kyungho Kim, Sung-Joon Jang, Sang-Seol Lee |
Exploring GEMM Operations on Different Configurations of the Gemmini Accelerator. |
ISOCC |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Bingyi Zhang, Akhilesh R. Jaiswal, Clynn Mathew, Ravi Teja Lakkireddy, Ajey P. Jacob, Sasindu Wijeratne, Viktor K. Prasanna |
Modeling the Energy Efficiency of GEMM using Optical Random Access Memory. |
HPEC |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Yunan Zhang, Po-An Tsai, Hung-Wei Tseng 0001 |
SIMD2: a generalized matrix instruction set for accelerating tensor computation beyond GEMM. |
ISCA |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Ananda Samajdar, Eric Qin 0001, Michael Pellauer, Tushar Krishna |
Self adaptive reconfigurable arrays (SARA): learning flexible GEMM accelerator configuration and mapping-space using ML. |
DAC |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Di Wu 0016, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim 0001, Joshua San Miguel |
uGEMM: Unary Computing for GEMM Applications. |
IEEE Micro |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Qingchang Han, Hailong Yang, Ming Dun, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian |
Towards efficient tile low-rank GEMM computation on sunway many-core processors. |
J. Supercomput. |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Mochamad Asri, Dhairya Malhotra, Jiajun Wang, George Biros, Lizy K. John, Andreas Gerstlauer |
Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods. |
IEEE Trans. Parallel Distributed Syst. |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Ananda Samajdar, Michael Pellauer, Tushar Krishna |
Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration. |
CoRR |
2021 |
DBLP BibTeX RDF |
|
23 | Ratko Pilipovic, Vladimir Risojevic, Janko Bozic, Patricio Bulic, Uros Lotric |
An Approximate GEMM Unit for Energy-Efficient Object Detection. |
Sensors |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Reza Hojabr, Ali Sedaghati, Amirali Sharifian, Ahmad Khonsari, Arrvindh Shriraman |
SPAGHETTI: Streaming Accelerators for Highly Sparse GEMM on FPGAs. |
HPCA |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Jianyu Yao, Boqian Shi, Chunyang Xiang, Haipeng Jia, Chendi Li, Hang Cao, Yunquan Zhang |
IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM. |
ICPADS |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Malith Jayaweera, Kaustubh Shivdikar, Yanzhi Wang, David R. Kaeli |
JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries. |
SEED |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Zhi Gang Liu, Paul N. Whatmough, Matthew Mattina |
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference. |
IEEE Comput. Archit. Lett. |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Uday Bondhugula |
High Performance Code Generation in MLIR: An Early Case Study with GEMM. |
CoRR |
2020 |
DBLP BibTeX RDF |
|
23 | Zhi Gang Liu, Paul N. Whatmough, Matthew Mattina |
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference. |
CoRR |
2020 |
DBLP BibTeX RDF |
|
23 | Thomas Faingnaert, Tim Besard, Bjorn De Sutter |
Flexible Performant GEMM Kernels on GPUs. |
CoRR |
2020 |
DBLP BibTeX RDF |
|
23 | Natalie Beams, Ahmad Abdelfattah, Stan Tomov, Jack J. Dongarra, Tzanio V. Kolev, Yohann Dudouit |
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs. |
ScalA@SC |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Eric Qin 0001, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das 0002, Bharat Kaul, Tushar Krishna |
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. |
HPCA |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Ioannis Oroutzoglou, Dimosthenis Masouros, Konstantina Koliogeorgi, Sotirios Xydis, Dimitrios Soudris |
Exploration of GPU sharing policies under GEMM workloads. |
SCOPES |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Guoning Lu, Dong Xu 0015, Ning Wang, Xiao Zhang, Degen Zhen, Hong Lei, Yunlong Bai, Dehui Kong, Hang Ruan, Zhifeng Chi, Xiankui Xiong, Ke Xu 0014 |
A Design of 16TOPS Efficient GEMM Module in Deep Learning Accelerator. |
ICTA |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Yunping Zhao, Jianzhuang Lu, Xiaowen Chen |
A Design of GEMM Parallel Computing Accelerator Based on Vector SIMD Technology. |
ICCTA |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Philip Colangelo, Shayan Sengupta, Martin Margala |
Sparse Persistent GEMM Accelerator using OpenCL for Intel FPGAs. |
ISCAS |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Andrew Anderson 0001, Aravind Vasudevan, Cormac Keane, David Gregg |
High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution. |
SBAC-PAD |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Sheng Wei Pang, Chai Quek, Dilip K. Prasad |
GEMM-eMFIS (FRI/E): A Novel General Episodic Memory Mechanism For Fuzzy Neural Networks. |
IJCNN |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Di Wu 0016, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim 0001, Joshua San Miguel |
UGEMM: Unary Computing Architecture for GEMM Applications. |
ISCA |
2020 |
DBLP DOI BibTeX RDF |
|
23 | S. Kala, Babita R. Jose, Jimson Mathew, Nalesh Sivanandan |
High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture. |
IEEE Trans. Very Large Scale Integr. Syst. |
2019 |
DBLP DOI BibTeX RDF |
|
23 | Roktaek Lim, Yeongha Lee, Raehyun Kim, Jaeyoung Choi, Myungho Lee |
Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors. |
J. Supercomput. |
2019 |
DBLP DOI BibTeX RDF |
|
23 | Xing Su, Xiangke Liao, Hao Jiang 0001, Canqun Yang, Jingling Xue |
SCP: Shared Cache Partitioning for High-Performance GEMM. |
ACM Trans. Archit. Code Optim. |
2019 |
DBLP DOI BibTeX RDF |
|
23 | Wenlei Bao, Li-Wen Chang, Yang Chen, Ke Deng, Amit Agarwal, Emad Barsoum, Abe Taha |
NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques. |
CoRR |
2019 |
DBLP BibTeX RDF |
|
Displaying result #1 - #100 of 133 (100 per page; Change: ) Pages: [ 1][ 2][ >>] |
|