	GEMM compatible routine for EV5(21164) and EV6(21264).

					2000/02/18
					by Kazushige Goto
						<goto@statabo.rim.or.jp>

Explanation of containing files.

COPYING.LIB     : GPL2 licence
Makefile 
README          : This file.
common.h	: common header file.
gemm_k.S	: main sgemm/dgemm assembler routine. 
 gemm_EV5_k.S
 gemm_EV6_k.S
zgemm_k.S	: main cgemm/zgemm assembler routine. 
gemm.c		: Front-end sgemm/dgemm routine.
zgemm.c		: Front-end cgemm/zgemm routine.
gemm_beta.S	: "multiply for beta" routine.
zgemm_beta.S	: "multiply for beta" routine(complex).
PERFORMANCE.EV?	: Performance Data
bmcommon.h	: Common Header File for benchmark program.
bm.c, bmz.c	: benchmark program.  "make check"
param.c paramz.c: sanity cheking routine
gemmf.c,zgemmf.c: Check routine

** Discriptions **

This package includes optimized gemm(sgemm, dgemm, cgemm, zgemm)
compatible routine for 21164/21264.  If you use my routine, you can
get 90% performance of 21264's theoretical value. 

** Usage **

It's entirely comatible with dgemm.f, sgemm.f, cgemm.f, zgemmf. Please
type "make". My object file name is "libgemm.a".  So you must remove
original *gemm.f and link libgemm.a, instead.


** Benchmarking **

Please type "make check" to run benchmark program.  This calculates
matrix multipling speed(single/double and real/complex).  If you want
to test other condition(i.e. size, leading size, SMP, EV5 CPU), you
may check "Makefile".


** Distributions  **

Based on LGPL(This has been changed before).

If you have any suggestions, comments or questions, please let me know.


Special thanks to 

Naohiko Shimizu <nshimizu@et.u-tokai.ac.jp>
               for advising prefetch strategy.

MAENO Toshinori <tmaeno@hpcl.titech.ac.jp>
               for advising internal block copy method.

                                               goto@statabo.rim.or.jp
