|
|
Venues (Conferences, Journals, ...)
|
|
GrowBag graphs for keyword ? (Num. hits/coverage)
Group by:
The graphs summarize 18 occurrences of 18 keywords
|
|
|
Results
Found 26 publication records. Showing 26 according to the selection in the facets
Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
77 | John A. Gunnels, Fred G. Gustavson, Keshav Pingali, Kamen Yotov |
Is Cache-Oblivious DGEMM Viable? |
PARA |
2006 |
DBLP DOI BibTeX RDF |
|
70 | David S. Wise, Jeremy D. Frens, Yuhong Gu, Gregory A. Alexander |
Language support for Morton-order matrices. |
PPoPP |
2001 |
DBLP DOI BibTeX RDF |
paging, quadtrees |
63 | David Rohr, Matthias Bach, Matthias Kretz, Volker Lindenstruth |
Multi-GPU DGEMM and High Performance Linpack on Highly Energy-Efficient Clusters. |
IEEE Micro |
2011 |
DBLP DOI BibTeX RDF |
multi-GPU, HPL, High Performance Linpack, DGEMM, double-precision general matrix multiply, GPGPU, system architecture, Green IT, heterogeneous (hybrid) systems |
56 | Rolf Rabenseifner, Sunil R. Tiyyagura, Matthias S. Müller |
Network Bandwidth Measurements and Ratio Analysis with the HPC Challenge Benchmark Suite (HPCC). |
PVM/MPI |
2005 |
DBLP DOI BibTeX RDF |
HPCC, HPL, DGEMM, PTRANS, FFTE, benchmarking, STREAM, latency, effective bandwidth, network bandwidth, Linpack |
54 | Daniel Hackenberg, Robert Schöne, Wolfgang E. Nagel, Stefan Pflüger |
Optimizing OpenMP Parallelized DGEMM Calls on SGI Altix 3700. |
Euro-Par |
2006 |
DBLP DOI BibTeX RDF |
|
47 | Stéphane Zuckerman, Marc Pérache, William Jalby |
Fine Tuning Matrix Multiplications on Multicore. |
HiPC |
2008 |
DBLP DOI BibTeX RDF |
multicore, cache coherency, BLAS |
30 | Hiroyuki Ootomo, Katsuhisa Ozaki, Rio Yokota |
DGEMM on Integer Matrix Multiplication Unit. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
30 | Pedro Valero-Lara, Ian Jorquera, Frank Liu 0001, Jeffrey S. Vetter |
Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores. |
SC Workshops |
2023 |
DBLP DOI BibTeX RDF |
|
30 | Jialin Li, Huang Ye, Shaobo Tian, Xinyuan Li, Jian Zhang 0070 |
A Fine-grained Prefetching Scheme for DGEMM Kernels on GPU with Auto-tuning Compatibility. |
IPDPS |
2022 |
DBLP DOI BibTeX RDF |
|
30 | Yi Wei, Lin Deng, Sizheng Sun, Sisi Li, Li Shen 0007 |
DGEMM Optimization Oriented to ARM SVE Instruction Set Architecture. |
ICPADS |
2022 |
DBLP DOI BibTeX RDF |
|
30 | Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura |
DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions. |
ISC |
2020 |
DBLP DOI BibTeX RDF |
|
30 | Tom Cornebize, Arnaud Legrand |
DGEMM performance is data-dependent. |
CoRR |
2019 |
DBLP BibTeX RDF |
|
30 | Pedro Valero-Lara, Ivan Martínez-Pérez, Sergi Mateo, Raül Sirvent, Vicenç Beltran 0001, Xavier Martorell, Jesús Labarta |
Variable Batched DGEMM. |
PDP |
2018 |
DBLP DOI BibTeX RDF |
|
30 | John D. McCalpin |
HPL and DGEMM performance variability on the Xeon Platinum 8160 processor. |
SC |
2018 |
DBLP BibTeX RDF |
|
30 | Lijuan Jiang, Chao Yang 0002, Yulong Ao, Wanwang Yin, Wenjing Ma, Qiao Sun, Fangfang Liu, Rongfen Lin, Peng Zhang |
Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor. |
ICPP |
2017 |
DBLP DOI BibTeX RDF |
|
30 | David Rohr, Volker Lindenstruth |
A Flexible and Portable Large-Scale DGEMM Library for Linpack on Next-Generation Multi-GPU Systems. |
PDP |
2015 |
DBLP DOI BibTeX RDF |
|
30 | Hao Jiang 0001, Feng Wang, Kuan Li, Canqun Yang, Kejia Zhao, Chun Huang |
Implementation of an Accurate and Efficient Compensated DGEMM for 64-bit ARMv8 Multi-Core Processors. |
ICPADS |
2015 |
DBLP DOI BibTeX RDF |
|
30 | Feng Wang, Hao Jiang 0001, Ke Zuo, Xing Su, Jingling Xue, Canqun Yang |
Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors. |
ICPP |
2015 |
DBLP DOI BibTeX RDF |
|
30 | Pawel Gepner, Victor Gamayunov, David L. Fraser, Eric Houdard, Ludovic Sauge, Damien Déclat, Mathieu Dubois |
Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor. |
J. Comput. |
2014 |
DBLP BibTeX RDF |
|
30 | Pawel Gepner, Victor Gamayunov, David L. Fraser |
Effective Implementation of DGEMM on Modern Multicore CPU. |
ICCS |
2012 |
DBLP DOI BibTeX RDF |
|
30 | Gideon Nimako, Ekow J. Otoo, Daniel Ohene-Kwofie |
Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures. |
SAICSIT |
2012 |
DBLP DOI BibTeX RDF |
|
30 | Jiajia Li 0001, Xingjian Li 0002, Guangming Tan, Mingyu Chen 0001, Ninghui Sun |
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs. |
ICS |
2012 |
DBLP DOI BibTeX RDF |
|
30 | Guangming Tan, Linchuan Li, Sean Triechle, Everett H. Phillips, Yungang Bao, Ninghui Sun |
Fast implementation of DGEMM on Fermi GPU. |
SC |
2011 |
DBLP DOI BibTeX RDF |
|
23 | Akira Nukada, Satoshi Matsuoka |
Auto-tuning 3-D FFT library for CUDA GPUs. |
SC |
2009 |
DBLP DOI BibTeX RDF |
|
23 | Massimiliano Fatica |
Accelerating linpack with CUDA on heterogenous clusters. |
GPGPU |
2009 |
DBLP DOI BibTeX RDF |
|
23 | Fred G. Gustavson, Isak Jonsson |
High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage. |
PARA |
2000 |
DBLP DOI BibTeX RDF |
packed format, level 3 BLAS parallelism, recursive algorithm, Cholesky factorization, recursive data structure |
Displaying result #1 - #26 of 26 (100 per page; Change: )
|
|