'AP?.Y
M_1:57P!A 6% VEP!4HF *GS,C(PZ;384%2L5Z@55KWO_:0&+_Y(U]/ZR__VH
M((# Y/#DAW6<$!--' '36@/>13MJ1$)'!Z)%-/(DQG4-XJ9("*0N0V,H9@N)
MLJQ@]I%\+"IC?@1*V-"*F3! 9Y]/4U;[()GA^%I>&,$. !M #^RIK)L&2I_3
M;6 &&=[\(&QQ-B^C6 H'/_*O$!<&FHC!A4- $Y,78B,.J38P*COC!/#*0R'
M 1ANBA:UD+('DQ(&S$\'50 O:_CFKP"#&P(,$0WF0(1(_WF:'X>9&'K8;3"#
M/@+8LE,=)!_:F6/ !A _ YCQ:&:/H#.0)J3R_H.1XRXA+,R:)2B&36%3-G
M,NNB59V?#+#,4; CD&"4)B$!99CATSL8QY ">-H62Z"4)8UB41> /-(NTNT
MH"$LX&P(.@J.@T'@.>IAXTJ9 $D .$ @3-[.4% @K,@$ PIY.X(S @,D $7U
M0+G"0C@8/ &@$##%>SL53@%P@+2?"/T#7 Z&&I:EYAJ$&*B3\,@3!0$Z-PR$
M@D&XGQ,& ?H>)3 S2$ZE "436@#VMH! CA%U_'ZI !-.30""83V':4P&:91
MMZ;L&,P90:;&!5*-.UB(-_A[CI@MN #.@GR$,,!W< TW?6R3NWHYW!A!C-B7
M@0'&V0";M\Y!1GH(Y5R5P/C<0T S* _*4TF%E/"#$E(GZR( LG@I!R R 41
M/!"(8. ,8%!"!TP)"W5&-];#"'11[@'8%!\/RDVN\)4% ODH0D!MA8>(" !T
M*# P".K_D,>"H *L0L'N@A( @SL$!S3Y@ V!W=)BQKL$3AQW@\;70$^ W,#
M""'^ANZ9C"/"Q-T8.YP <8)] ;$!!(/>V^W97F)H)P<0'P.IMD(72?P%,P*3
M9@!@P,;#2$1CK@!1P$^G!T>% *I[R(4$)C.NF,9!$,!DMB:E$-F5 !3%D 6%
M_&< 0B+@TJ,'>D*" 6 #\M4"@"03B.+A]2&!C=^ X+ 4MC59I$.,4P!X!_$
M&2F8 A(91.>0(*$AP+ACP!-52+SOV#14X"DP <#P 3$D"6&6H\$P! !#1^!
MP#A"2OP@\15$)> !> 8%-*:AL'E0L&6!T)SB7$#!L#$ , "DM 3-&*G_ '
M'B4<(4\P(.B C&C #0((L@ F!P^$"*5T@PGC* %_H+P #43"$+I )@W_%W6
M]((X"_0!""(
M006"B00)!KT BIT !\K0Y2#D__#[K38(3! (EKR]E@W<7Q04!>/SR3J6@W\0
M;)$4%<2O%P')2Q .43$P,"6"YJ59+DID)$$(#P7.H4$$<@(4L S[@HDOH9"&
MX@(14#HR,+B3$P$<1&]J0D!#P\,IR_#1'!)0HR,#6/%48AI QJ)9",/L049)0LC1UP@"@]R*ND#
M"V B""X2+G0["0=\#Q@I2( R#@<@(O0@,B(L$GHV-" 71WH((P,Z"#$D%P4!
M"V,: C4("08,"2T%! G64Q*6"%P.!,#SX"0@LA$)#07A]0586HTY 0X+>C4-
M%1,N 9 H VDT#A0@&G072Z P#! )A@\4$AF@H4 @$Q ! P4" 4(M!@\2( P/
M!_ [#PXO#P8&(2XNK(U8( @%$@4G$R 4#R !# P@&0\5(! #+0<5&1,AE@0N
ME@ ";B"6&+OX"PC&!YXR,#8QUFM79P"-8BZ-(="-A@+CR>6B)[VZ9,,$J0^G
MD]C)//()F"@]V43ZHE*@>(0!N5@(F?\$E+VYKUB9 /]!/O?.&:[.20B1=^Y,
M"P63C(6$AX6(KJ")B>C(L7K EM 23LD T >0#JAHJB!.!8WH!^Y Z@PURN";
MB-#OS'30S[PG$7HMO"@%E"[ [W(!AJTSK^Y/!= #[E %Q@'JZNKF 6 QLAL8
M+3DV.B $!04+ 1DO Q(%1*HK !@@(" .+Q Z$Q09# 66'P"I&\#J7] @J.5@
M>*DT41N@A*+]($3A!WT@1@BB +UA")W<0>C0]ZE,C0T(J3>% 5BB'[V,A)TP
;_>S&]TP1EX0\ACX_NH0> USE@*(^L3N1/ ;(if we don't introduce a delay, the text will scroll by
> ; WAY TOO FAST to READ ! ! Incredible, this 1 Mhz bugger
> ; machine of ours is, eh?) - delays not set in stone, play
> ; with them :) higher, lower, etc...
>
> LDX #$05 ; just a delay loop - ldx counted down by dex
>waitloop LDY #$FF ; a delay loop within a delay loop
>wl2 DEY ; decrement y in our delay looop
> BNE wl2 ; keep counting y down until y = 0
> DEX ; decrement x in our top delay loop
> BNE waitloop ; repeat the y loop until x = 0
[even more snipped here]
In the following, I'll describe *very* basic features like the
concepts of rasters and frames. This is probably only interesting
for newcomers in the assembly-domain.
If you are looking for a smoother scroll, you would use what is commonly
known as "rasters".
Your tv-set displays the image on the screen using an electron-beam
which moves from the top, left corner down towards the lower, right
corner in horizontal lines. If my memory serves my right, PAL (mostly
european) screens have 312 lines, while NTSC (in the USA) screens have
262 lines. We count these lines starting from 0 (263 total), and call them
"raster-lines", or just "rasters".
The Commodore 64 has a few registers in the VIC (the chip that generates
the video-image) which reflect where the electron-beam is. We have one
register at adress $D012 which contains the first 8 bits of the raster,
and another at adress $D011, where bit 7 describes the 9th bit of the
rastervalue.
You can safely regard the rasterline as a Y-coordinate on the screen
that starts from above and works its way down.
In this way, you can use this formula to obtain the rastervalue:
raster= peek(53266)+ (peek(53265) and 128) * 2;
Note, that this isn't of much use in a basic-program because
basic simply is too slow, and when the calculation is done,
the rasterline has changed so much that the value of "raster"
is more random than useful.
Well, the value in $D012 goes from 0 to 255 while $D011 bit 7 is 0,
refrecting that the rasterline goes from 0-255. Then bit 7 in $D011
is set to 1, and $D012 goes from 0 to 55 on a PAL-system, refrecting
that the rasterline is 256-311. This is what is know as a "frame",
and the cycle starts over with $D012 being 0, and bit 7 in $D011 also
being 0.
You can exploit this register to get a smooth scroll in several ways.
I'll describe the most primitive of these techniques in the following.
A smooth scroll is achieved, if the text is moved every frame, not
more, not less. In other words, the delay loop in the scroll should
be adjusted so that the entire scroll-loop is performed exactly once
per frame.
Okay, that's fine, but how do I achieve this?
You simply monitor the rasterline. A simple way of doing this is to
use a bit of code like this, which waits until the rasterline is
at position 0 exactly.
wait lda $d012 ;Wait until the lower bits of the rastervalue
cmp #0 ;is 0.
bne wait
lda $d011 ;We also need to check if the high bit of
and #$80 ;the rastervalue is 0, in order to distinguish
bne wait ;raster 0 from raster 256.
Another, slighty optimized version, could be:
lda #$ff
wait cmp $d012
bne wait
which waits until the rasterline is $ff. This one exploits the fact
that the rasterline never exceeds 311, so $d011 bit 7 will always
be 0, when $d012 is >55.
You should try to play around with such wait-loops, and with a little
effort you should be able to do a smooth scroll.
You can also try to use $d020 to set the border-color at different
rasterlines, like:
wait1 lda $d012 ;Wait for rasterline $30
cmp #$30
bne wait1
lda $d011
bmi wait1
lda #1 ;Set border-color to white
sta $d020
wait2 lda $d012 ;Wait for rasterline $108
cmp #$08
bne wait2
lda #0 ;Set border-color to black
sta $d020
jmp wait1 ;And keep looping
Notice that the second wait-loop exploits that the next
time $d012 will be $08 after rasterline $30, will be
at rasterline $108, which makes a check on $d011 unnecessary.
This little program will produce a white stripe in the
border of the screen, but the area where the color
changes will probably "flicker" a bit, i.e. move randomly
around in a small area.
This is because we only check when the rasterLINE is right.
Ideally, we would also want to check for the right "raster-
coloumn" to be right, so that the cut could be at our specific
(x,y) point, but unfortunately the c64 doesn't directly provide
such a rastercolumn register, so removing the flicker can be tricky
business.
Of course there's much more to it than this. For instance, the
VIC provides a facility to automatically announce to the program
when a certain rasterline arives, so that we needn't check the
rastervalue ourselves in a loop. This technique is known as raster-
interrupts, but I'll leave that subject to another time.
Until then, welcome to the world of rasters, and happy hacking!
Asger Alstrup
--
For questions and comments, 'XmikeX' may be reached through the Editor-in-Chief
of disC=overy.
/S03::$d000:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
The Raster StarterKit,
-+-+-+-+-+-+ -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+ -+-+-+-+-+-+-
A classic 'slice' of VIC-II code
by XmikeX and Dokken/Electron
My early experiences with VIC-II programming on the C64 led me to the
conclusion that being able to manipulate the VIC-II on a rasterline level
is essential to success as a "coder". Over the years, 64 coders have moved
far beyond this simple dogma, but it is recognized that all have to start
somewhere. The following is an example of one of my early attempts. I
still use this bit of code to form the core of many of my programs. I
would like at this time to thank Dokken/Electron for his part in providing
the base code that follows. Please note that for the purposes of proper
instruction to those who might be more familiar with BASIC than ML, the
source below has all values in decimal (same as with BASIC peek/poke).
However, please note that raster programming requires quick action and is
best left to ML code.
*=49152 ; origin at 49152 or whatever free memory location desired
sei
lda #127
sta 56333
lda #18 ; the first 3 STA's just tell the 64 to
sta 53265 ; do IRQ's. Well, the 53265 hi-byte value
lda #1 ; says to do a IRQ at rasters 0-255, but
sta 53274 ; whatever
lda #100
sta 53266 ; the IRQ'll happen at raster line 100
lda #main
sta 789 ; high byte of IRQ routine
cli
jkjk
jmp jkjk ; this just happily jumps back to itself
; whenever we're not in an IRQ
main
rol 53273 ; or: lda #1:sta 53273 tell the 64 an IRQ
; happened or something goofy like that
ldx #1
jsr rast
ldx #0
jsr rast
lda #126 ; this sets up a raster IRQ at scan line
sta 53266 ; 126 for the 'rain' routine
lda #rain
sta 789
jmp $ea81 ; ok. you can use either $ea81 or $ea31
; $ea31 with an RTS where the 'jmp jkjk'
; is allows the 64 to process everything
; normally. ie. you can run a basic
; proggy and have an ML interrupt doing
; something on top of it like maybe playing
; a happy tune or something.
; $ea81 gives you total control with no
; overhead. ($ea31 scans the keyboard
; takes a decent chunk of time). So with
; $ea81, if you want to read the keyboard,
; you get to do it explicitly.
rain
rol 53273
ldx #6 ; rasterline is now set at color : blue!
jsr rast
ldx #0
jsr rast
lda #100 ; this sets up a raster IRQ at scan line
sta 53266 ; 100 for the 'main' routine
lda #main
sta 789
jmp $ea81
rast ; this routine just makes a raster bar out of
lda 53266 ; the value in the x register. Note that this
rast2 ; only works for 7 of 8 scan lines cuz the
cmp 53266 ; 64 needs to chunk away at graphix
beq rast2
stx 53280
stx 53281
rts
Try and expand this routine to include many many rasters or color bars,
or anything you dream up. It would be prudent to point out at this time
that a text such as "Mapping the 64" is extremely useful for delving into
VIC-II chip registers. "Mapping the 64" in particular, gives a wonderful
description of VIC-II register function.
Goodbye.
--
For more information or general commentary on this article, XmikeX may be
reached through the Editor-in-Chief of disC=overy.
/S04::$d000:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Heaven in the net, an unedited excerpt of IRC on #c-64
by Mike Gordillo
As a self-proclaimed "demo freak", the following transcript (largely unedited)
represents one of the most interesting discussions concerning C-64 that I have
ever witnessed on IRC (Internet Relay Chat). I present this to the reader
in the hopes of encouraging further participation and patronage of IRC channel
#c-64. We begin with a discussion on raster programming by "Fungus", "_dW",
"hld", "Sorex of WOW", and "XmikeX".
--
<[Fungus]> I'd like to know just how to do that. Stable Raster that is. I've
tried double interrupts and stuff and every now and then I'd get like
this 1-2 cycle jitter I couldn't get rid of.
<_dW> My raster is completely stable.. or so I think.
fungus: use nops or something to time it out
<[Fungus]> Sorex : I did and still couldn't get it out. The next routine's
timing kept changing the original timing.
<_dW> Fungus: it works like this: set up a raster int on line 'n',
then in the irq code change the $0314-$0315 vector to point to a
different routine and set a raster to strike on line n+1.. execute
NOPs 'till it strikes. When the second raster hits, you'll be
at most 1 cycle off, which can then be corrected by code like
LDA # n+1 : CMP $d012 : BEQ sync (sync: rest of the code)
<[Fungus]> I 've tried that too, and banking out The Kernal etc... Using fffe
& ffff. I still got goofy jitter. It would be totally stable for like
5 second then screw 1-2 cycles to the left exactly 3 times. Then would
be stable for another 5 secs. CIA's are Disabled as well as NMI's: NTSC.
<_dW> Fungus: you must have left out something... what method did you use?
<[Fungus]> Tried it both ways.
<_dW> double ints and?
<[Fungus]> Didn't matter, double Interupts VIA 0314 0315 Double Int VIA
FFFE FFFF too, and yes, all in NTSC R-8 VIC chip so 65 cycles.
fun:i have a source here from Graham/Oxyron for that 4*4 display
routine, i uses multiple IRQ's to get the damn thing steady timed
and nops
<_dW> Fungus: send me mail at agonzalez@nlaredo.globalpc.net I'll send you
source code for a double int 0314/15 raster sync. Sorex: 4x4 display?
dw:you know... that mode everyone uses to get fullscreen effect
like rotzoomers/plasmas/doomstuff/tunnels/...
<_dW> Sorex: the one with the plasma in dawnfall/oxyron?
dw:yeah, and the fire in 'The Masque' ....
<[Fungus]> You guys wanna idea? Have any of you tried playing with $d012 like
decrementing it 1 cycle after it changes?
<_dW> Fungus: that could cause an interrupt, if they're enabled and $d019 lsb
is clear.
If I could interject a simple question -> Does the VIC-20 have a
$d012 equivalent?
XmikeX - the vic20 has no $d012 equivalent - so you cant tell
what raster you are on.
<[Fungus]> Hehehe On the contrary. I code stuff on the Vic-20 too. You can just
poll the raster register and SEI and never clear it. Makes it preety
easy. It's just that the Vic-20 treats 2 raster lines as one. I made
a cool looking 16 color ROL scroller on it once. (Basically, its $d012
equivalent does not generate IRQ's like $d012 does on the C-64).
I prefer to just poll $9004 ($d012) Easy to do. Then you can do
all sorts of twisted stuff. The VIC is a different Animal. Really
neat registers. Remove the borders with out an INT even!
I'd like to see people get interested in the Vic-20. It's capable
of some cool stuff. You can make the screen scroll in any direction
really Fast. The Entire screnn too -No borders. Full Charsets and
big color Memory. Full Screen High res too, no borders.=]
[...]
<_dW> hld: you did real raytracing in your 4k entry?
dw - yup.. My 4k entry distorted a checker board. The code was about $4f0
bytes (and 20KB of tables ;)
<_dW> Are you using lots of rom routines? Any division?
dw - It uses ROM routines to create the 20KB tables, but it is pure
integer assembly when running - with very small error rate.
I have no division at all in this tracer.. only +,*,srqt
Remember, pre-calcing all the sqrt's takes quite a while and it
is all pre-calced in floating point, but i do some nifty weird
things so that it only needs to accumulate integers to do the trace.
Basically, it does one frame in about 4 seconds where a frame is
defined as 256x64 monochrome.
[...]
Wave, how is your plotter routine coming along?
Right now, its "doing" 16*16 character rotation. Going to optimize
it a bit and see how far out I can go and still have the routine update
reasonably fast, but my line vector is under 4k.
Any drags so far?
The main thing that slows down the vector routine is the clearing
of the next buffer. Man, that EATS time... but a character or bitmap
rotation doesn't necessarily need to be cleared first, and even if it
does, the amount of zeros to stuff into memory is significantly less.
<_dW> Wave: are you doing a complete clear? or a redraw-clear?
Dw: complete zap of memory...but I toyed with re-draw clear idea.
Hmm, nothing really faster than inlined STA :(
Mr. X : I got a loop like this:
lda #$00
ldy #$00
sta $2000,y
sta $2080,y
sta $2100,y
... ...
iny
bpl
Each STA line will zap 128 bytes in that loop... I may unroll it
further, but I don't know.
How about :
LDX #0
LDA #times/4
Sorry, LDY #0
STY ....,X
BNE
DEX
If you INX, you need a CMP or you waste time going over your area.
ah... you are DEXing from times/4
1 = 256, 2 = 512, 3=768, 4=1024; none of which is the 800 bytes
you mentioned.
2k of zeros; 2k=$800
You should still kill it by pages!
It does kill by pages!
?? It wasn't clear from the code. How many STA's inside loop?
several
Define several? It should be like 8 STA to kill 2K : base+0,
base+256, etc up to base+2K-255
Here is my 'clear' code, Mr. X :
sta $2000,y
sta $2080,y
sta $2100,y
sta $2180,y
sta $2200,y
sta $2280,y
sta $2300,y
sta $2380,y
sta $2400,y
sta $2480,y
sta $2500,y
sta $2580,y
sta $2600,y
sta $2680,y
sta $2700,y
sta $2780,y
iny
bpl
rts
I cover the whole area in one loop, except I kill 128 bytes with
each STA.
Wave: But that's way past the point of diminishing returns....
And why pick 128 bytes?
Because it runs faster than 256 bytes!
So you got, what, 16 STA in your actual code?
if each STA wipes 256 bytes, then the loop runs 256 times. Which
means that all the loop overhead (agreed - not too much, but there)
is multiplied by 256. If you loop 128 times, I would think it should
reduce the overhead by half.
Have you actually done the math?
I should time it tonight to be sure.
effective cycles = loop + overhead / total cycles <-- will give
an average number for overhead cost.
Well ok, but heck if I'm going to 1:1 unroll it though! 2000 STAs,
no way...even if I write a program to generate them.
4K for your clear routine! hehe :) But 1:1 may lead you past the
point of diminishing returns...
Well, I used 128 because there is only a INY (or DEY depending on
what you think when you code) and one branch. (BPL or BMI respectively).
BTW, 256 has the same functionality... Both will set a processor flag.
Hmmm, actually all numbers will with a DEY.
In general, I DEX BNE, just considering it a good habit, always trying
by pages.....
But doesn't it stand to reason that a 1:1 is the fastest since it
has no overhead? So if thats true, if you only looped twice, you'd
cut out half the STA's, but increase the overhead by a factor or 2.
The overhead is small though (couple cycles for dey, couple for the
branch.)
Yeah, the overhead is small. Your first STA is the "best", each
additional one after that gives less and less improvement in speed and
just takes more space....
And if you need to zap like 2k of memory, there ain't a Bxx
instruction in 6510 thats going to get you back to the beginning of
that STA loop. =) After all, you can only have 40 or 41 STA's to use
a Bxx instruction.
Time to unroll your loop, Waveform :).
Then I'd get 6144 bytes of code to clear 2048 bytes of buffer. :(
I can't believe I am thinking of making a 6144 byte clear routine and
actually, it would have to be 12288 bytes, since I double buffer.
Three bytes per STA... 2048 bytes to clear... (2048 * 3 = 6144) * two
buffers doubles that to 12288.
Hmmm... Loop overhead... Let us see here...
Each time you gotta INY/DEY and CMP and/or Bxx (branch) or JMP
at the end of the loop, with the number of times through the loop
multiplying the overhead. The less times through the loop, the less
overhead.
Yes, and DEX BNE = 2 + 3 = 5 (assuming your smart enought to line
it up in one page! otherwise 6 cycles)
Yeah, if I completely unroll it, I'll have NO loop overhead
to deal with at all... I will certainly have to give this idea some
further thought.
[...]
So, did you take MrX's advice?
Yes, I unrolled that clear loop, but I'm not sure how much savings
I was able to obtain. I mean I know how much in cycles, but I don't
see much of a real difference it made to the overall performance of
the code.
Well, there should be a difference.
Yeah, but I'm not seeing THAT much of a diff... hmmm... but it was a
while inbetween running each version. I should run them one right after
another and see. So far, to tell the truth, the effect seems neglegible.
What are your cycle times now?
10752 for old code - 8192 for new code = 2560 cycles.
Hmmm, if I am doing my math right, then it is actually 2.5 ms.
Let's see, NTSC C64 draws a full screen every 1/60, and if one
raster line is 1/263 of 1/60 sec in NTSC, it takes approx. 0.0000636
seconds to draw one raster line, and there are 65 cycles per line.
This means one cycle is more or less .0000009 seconds which
is approx. NTSC 1Mhz rate. My calculator lacks the required
precision here, but enough to give credence to the clock rate. :)
So if we forget about 'bad lines' (VIC-II DMA), I am saving
around 2.5 ms per iteration. My code rotates an object around
its center completely in 8-bit increments, hence the routine is
called 256 times.
So 256 * 2.5 ms = 640 ms per 1 complete rotation (.640 sec! :)
So I lose 12k of RAM for .640 sec..... NOT! I will rip that code
out and instead work on speeding up my line drawing code.
Sigh...plotting...truth tables... blah :)
Hehehe, 0 OR 1 = 1
0 AND 1 = 0
0 EOR 0 = 0
0 EOR 1 = 1
1 EOR 1 = 0
Etc., Etc., Etc...and remember, anytime you got that N in front
(NAND, NOR, NOT, etc) you just invert the truth table.
Hey gang, wassup?
Wix, right now I am looking at a balance of neatness in programming
vs speed of execution (plotter)
the age old conflict... speed vs smoothness, how many cycles?
Mine is 70 as it stands.
wave - that's your theta,r plotter right?
Yeah, and the plotter doesn't care where the coords come from, Wix.
yeah, I know, a general point plotter... but where it plots points would
change the code a bit, no? the plotter takes r,theta and turns the right
pixel on in the chargrid right?
I coded 3 general purpose routines:
1> point rotater (basically, change all the thetas)
2> Resolve r,theta into x,y
3> plot x,y on charmap
ahhh... ok... I thought you had said that there was no step 2
(conversion of r/theta into x/y) ??? the speed of the steps 1-2-3 are
combined at 76 cycles?
no no no, I don't have inital x,y... no need to convert x,y into
polar just to rotate it and bring back.
well I udnerstand you start from polar, but I was imagining you had a
direct r/theta to bitmap plotter... that's all
yeah... well I have to resolve into X,Y somewhere, Wix. =)
ok, so your step 1-2-3 combined is 76 cycles?
But the point of the routine is that I don't start in X,Y... Which
makes the Rotation part very fast... just inc/dec the thetas.
And, no... just the plot into the char map is 70 cycles (as it
stands, I think I can get it down to below 60 though)
but, yes, you do convert back to x/y to plot. That's your step 2 right?
Wix, yeah.
Wave well, you could do better on the x/y plotter. But if what you
have works & looks good... who cares? At the most I was able to
achieve 35 cycles onto a full screen bitmap coming from X & Y,
with an X high bit. Think for a moment... :)
Oh wait! I've just had a brainstorm. Be right back, getting a pen.
DUDE!!!!! Woah!!!!! Just a sec... let me recount this.
I -think- I have a 26 cycle plotter now.
26 should be possible in a right-made chargrid, but not on a bitmap
I think...
Perhaps, but I will definately explore this! In fact, once I'm
done, I think I'll put this all down in writing. :)
[...end of transcript...]
/S05::$d000:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
A demo of 'disC=overy', =
-+*+-
= The exploration of rotation :)
by John Kaiser (Waveform/MLM)
Shoooom! Greetings from the Wave! =)
In this article I am going to talk about a relatively simple two-
dimensional vector routine. Before I dive in, however, I'd like to
give some credit to Steven Judd, who through one of his C= Hacking
articles showed me the way to the core of a fast line drawing routine
as well as an amazing method of quick multiplication. I'd also like to
give thanks to those 'hard-core coding animals' on IRC channel #c-64
for their most-nifty insight. Before I forget, to all those who I have
been taunting with my (I think, I hope..) 'super fast pixel plotter', you
are finally able to see the source for it. Its so extremely simple, you'll
probably kick yourself for not coming up with it yourself.
[Ed. Note : According to S. Judd, the 'quick multiplication' method
which John is referring to was originally formulated by George Taylor.]
For all of those expecting a full blown demo in this article, I am afraid
I have to disappoint you. The "real world" has been taking a heavy toll on
me during the last month so the amount of time I was able to spend working
on this article was extremely limited. What you will see, however, is a
fully functional rotating vector routine, in a relatively basic form.
There are tons of uses for the routines involved, and infact, my next demo
uses quite a few variations on the code you are about to see.
It is interesting to note that I did not set out to code a rotating vector
routine. =) I have another project I'm working on that involves some of
the same routines. While working I thought, "Hey, I could probably rotate
a two dimensional shape pretty easy with this."
And Voila! You get something like the source code that follows.
Preface To The Code
-------------------
The trick to this code is referencing the endpoints of a line by a radius
and theta, as opposed to regular X and Y.
The advantage to referencing the endpoints - or even just individual
pixels - like this is that to rotate a point, you merely add or subtract
from the theta value. This process ends up being extremely fast,
especially when compared to taking an X,Y coordinate and sending it
through a whole slew of mathematics just to move it around. Observe a
quick comparison of the steps needed to rotate a number of endpoints about
their geometric center:
Old Method:
-----------
1. Get an X,Y endpoint
2. Convert to radius and theta (or some other convenient rotation friendly
reference)
3. Apply the rotation
4. Convert back to X,Y
5. Store result
6. Loop back until all endpoints are finished
Nifty Method:
-------------
1. Get a radius,theta endpoint
2. Apply the rotation
3. Store result
4. Loop back until all endpoints are finished
What is missing is the slow, and sometimes bulky conversion routines. Even
if you have a fast conversion routine, this method still is faster because
you have less steps from beginning to end. You have less code to execute
per endpoint.
One other trick to this particular code is that my circles have 256 units
to them, while most peoples circles have 360 degrees. Rather than have a
boring name like "units" I decided to use the name "byts". So, by way of
comparison, a normal circle has 360 degrees, while my circles have 256
byts.
Why use a different unit of measurement? The thinking man will quickly
realize that in our nifty 8bit machines, the largest value a single byte -
register, or memory location - can hold is 255. By using the byt unit of
measurement, we can make an entire circle with just one byte, and not have
to worry about fiddling with high bits. This way of thinking saves many
many processor cycles, and many many hairs on a coder's head. =)
Finally, before I slam you with the source code (which, by the way, is in
TurboAssembler format) there is one more trick to this particular code.
Sometimes the only way to get real speed out of a Commodore is to reduce
the code necessary to perform calculations. How do the great coders do
that? The keyword(s) are LOOKUP TABLES.
This code uses tables for the plot routine and for the multiplication
routine. I wrote basic programs to create the tables and will do my best
to explain how they work, near the end of this article. As to the tables
themselves, I'll explain how the code uses them when I get to the relevant
section of the source code.
Wow, what a preface! Now, on to bigger and better things: The documented
and commented Source Code!
The Source Code - In TurboAssembler Format
-----------------------------------------
Okay, here it is. I'll be commenting on the code throughout the remainder
of the article in hopes to further clarify what the code is doing.
First, no code is complete without a little self glorification before the
origin and equates are setup...
;---------------------------------------
; A Demo of disC=overy - By: Waveform
;
; (c) 1996 for disC=overy Magazine
;"The Journal of the Commodore Enthusiast"
;---------------------------------------
The afore mentioned origin and equates...
*= $0820
points = $033c ;number of points
angleinc = $033d ;angle to increment
curshape = $033e ;current shape
theta = $0340 ;thetas
radiu = $0350 ;radius
itheta = $0360 ;tab of init thetas
iradiu = $0370 ; init radius
scrloc = 1196 ;location of top
; left corner of
; char matrix on
; screen
msin = $3000 ;table of sines
mcos = $3100 ; cosines
msqr = $3200 ; squares
lobyte = $3300 ;lo bytes
hibyte0 = $3380 ;hi bytes ($2000+)
hibyte1 = $3400 ; ($2800+)
bitmask = $3480 ;pixel bit mask
Okay: "points" is the number of endpoints for the current displayed shape.
"angleinc" is the amount to rotate the shape, in byts.
"curshape" is the number of the current shape. This demonstration
has three predefined shapes. Feel free to experiment
with the ones I included or add your own!
"theta" is a table of thetas. This code is written such that a
shape can have up to 16 endpoints defined. To change this
"limitation" simply give yourself more bytes in-between
these tables.
"radiu" is a table of radii. Both "theta" and "radiu" change as the
program executes. At any point in time, these two tables
hold the current values of all the endpoints for the
current shape.
"itheta" is a table of the initial values of the shapes theta
coordinate component.
"iradiu" is a table of the initial values of the shapes radius
coordinate component. Both "itheta" and "iradiu" tables
are static and are not changed by the program. It is
convenient to have these two tables for when you need to
alter the coordinates of a endpoint relative to its
initial position.
"scrloc" is used by the routine that places the character matrix on
the screen. 1196 places it in the center of the screen.
"msin" is the base address of the table of sines. Each value
represents the sine of the address, relative to the base. So
that msin+0 = sine of 0 byts, msin+64 = sine of 64 byts, etc.
"mcos" is the base address of the table of cosines. Works the same
way as the sine table.
"msqr" is the base address for the table of squares. Works similar
to the sine and cosine table, except that this table is a
table of squares of both positive and negative numbers. More
on this later.
"lobyte" is the base address of a table of low bytes used by the
plot routine as the low byte of the address where the
pixel to be plotted lives.
"hibyte0" is the base address of a table of high bytes used by the
plot routine as the high byte in buffer #0 where the
pixel to be plotted lives.
"hibyte1" high byte of the address in buffer #1 where the pixel to
be plotted lives.
"bitmask" is the base address of a table of bit masks applied to
the byte where the pixel to be plotted lives.
;---------------------------------------
jsr demoinit ;init for demo
;---------------------------------------
That's just a JSR to the demo initialization routine...
main lda #%11111110 ;scan matrix
sta $dc00 ;
lda $dc01 ;
cmp #%11101111 ;f1
beq keyf1
cmp #%11011111 ;f3
beq keyf3
cmp #%10111111 ;f5
beq keyf5
cmp #%11110111 ;f7
beq keyf7
cmp #%11111101 ;return
beq keyreturn
lda #%01111111 ;scan matrix
sta $dc00 ;
lda $dc01 ;
cmp #%11101111 ;space
beq keyspace
cmp #%01111111 ;return
beq keystop
What the above does is scan the keyboard matrix to see if the user has
pressed one of the keys our program is looking for. Later on you'll see
that we disabled the CIA interrupts that cause the computer to scan the
keyboard, so we needed a way to tell if one of our "do-something" keys was
pressed. We "scan" twice since the various keys we are looking for are on
different rows of the keyboard matrix.
reentry jsr rotate ;rotate shape
jsr drawshape ;draw shape
jmp main ;loop
;---------------------------------------
The little code snippet above is where the JSRs to the real workhorses
are. As you can see this demo is pretty simple. =) We rotate the current
shape, and then we draw the shape. Then we go back to the beginning and
see if the user wants to do something.
keyf1 jmp rotup
keyf3 jmp rotdn
keyf5 jmp expup
keyf7 jmp expdn
keyreturn jmp newshape
keyspace jmp stoprot
keystop jmp stopdemo
;---------------------------------------
Above is a little jump table to the various little routines that do what
needs to be done when a key is pressed...
rotup inc angleinc ;increase
jmp reentry
;---------------------------------------
rotdn dec angleinc ;decrease
jmp reentry
;---------------------------------------
The above two "routines" change the value by which the shape rotates per
run-though of the main loop. As you can see, referencing the endpoints as
radius and theta takes alot of the work out of rotating a shape about its
center! Each time the user presses the F1 key, the angle that the shape
rotates per run-though increases. Holding down the F1 key will cause the
shape to spin up quite rapidly. Indeed, if you continue to hold down the
F1 key, the speed of the rotation will appear to increase to the point
where it begins slowing down again.
Obviously, F1 increases the angle of rotation, and F3 will decrease it.
expup ldy #$00
expup1 lda radiu,y
clc
adc #$01 ;add one
cmp #63 ;at maximum?
bcc expup2
lda #63
expup2 sta radiu,y ;store
iny
cpy points ;do all points
bne expup1
jmp reentry ;return
;---------------------------------------
expdn ldy #$00
expdn1 lda radiu,y
sec
sbc #$01 ;subtract one
cmp #2 ;at minimum?
bcs expdn2
lda #2
expdn2 sta radiu,y ;store
iny
cpy points ;do all points
bne expdn1
jmp reentry ;return
;---------------------------------------
What the above two routines do is expand or shrink the shape. These two
routines work well because of the way our shapes are defined. The shapes
that come with this demo are all equidistant from the objects center. So,
it is easy to change the shapes size by altering all of the shapes radius
coordinate components. Obviously, more detailed shapes that have endpoints
that are at varying radii from the center will need improved routines if
the shape is to maintain its proportions through the expansion and
shrinking.
Notice the error checking, also. Bad things happen when are radii get too
large or too small for our line drawing routine to handle properly.
newshape inc curshape
lda curshape
cmp #3 ;last shape?
bne newshape1 ;nope
lda #$00 ;select first
sta curshape ; shape
newshape1 cmp #2 ;8 point star?
bne newshape2
jsr initobj20 ;init shape
jmp reentry
newshape2 cmp #1 ;2 triangles?
bne newshape3
jsr initobj10 ;init shape
jmp reentry
newshape3 jsr initobj00 ;1 triangle
jmp reentry
;---------------------------------------
The above routine handles the users request to change the displayed shape.
There are better ways to handle this, but since this demo only has three
shapes built in to it, the quick and dirty way suffices without being
extremely bulky.
stoprot lda #$00 ;stop all
sta angleinc ; rotation
jmp reentry ;
;---------------------------------------
The above little bit of code simply stops the rotation of the current
shape. It merely stores a zero into "angleinc" which controls the amount
of rotation applied to the shape.
stopdemo lda #$81 ;reset cia
sta $dc0d ; interrupts
jmp $fe66 ;exit via
; kernal warm
; start
;---------------------------------------
The above code resets the CIA to its default setting so that when our demo
exits back out to basic, the user is able to type something. =) It then
exits via the Kernal warm start vector to reset other important things
like the VIC chip. =)
rotate ldy #$00
rotate1 lda theta,y ;get theta
clc
adc angleinc ;add amount
sta theta,y ;store theta
iny
cpy points ;do all points
bne rotate1
rts
;---------------------------------------
Okay, there it is. The code that actually rotates the shape. It takes the
current value of the theta coordinate for each endpoint and adds the value
of "angleinc" to it. Extremely simple and quite fast too. =) You can speed
it up a bit by unrolling this loop, but for this demo, there aren't enough
endpoints to work through to get that much of a savings.
drawshape ldy #$00 ;first point
lda theta,y ;pass theta &
sta gxytheta ; radius to
lda radiu,y ; getxy for
sta gxyradius ; conversion
jsr getxy ;
lda xpos ;convert polar
clc ; to
adc #64 ; Cartesian
sta x1 ;
lda ypos ;
clc ;
adc #64 ;
sta y1 ;
The above code starts off the shape drawing routine. It gets the first
endpoint prior to entering the loop below. The loop below joins endpoint
to endpoint with a line.
You will notice that we do actually convert from the radius,theta system
to the X,Y system, finally. This is necessary because I haven't yet
written a routine that will draw lines only from radius and theta. =) Also
notice that it adds 64 to both the X and Y coordinate. This is to center
the shape in our character matrix. The routine that converts from
radius,theta to X,Y returns numbers ranging from -63 to +63, so adding 64
also normalizes our coordinates to make them easier to plot in our
character matrix. (It makes the range of possible values equal to 0 to
+127)
ldy #$01 ;second point
ds2 lda theta,y
sta gxytheta
lda radiu,y
sta gxyradius
jsr getxy
lda xpos
clc
adc #64
sta x2
lda ypos
clc
adc #64
sta y2
sty dsy ;preserve .y
jsr drawline ;draw a line
ldy dsy ;restore .y
lda x2 ;make endpoint
sta x1 ; of this line
lda y2 ; start point
sta y1 ; of next line
iny ;do all points
cpy points ;
bne ds2 ;
That's the routine that puts the shape on the screen. You may of noticed
that each shape has one extra endpoint. For example, the triangle has four
endpoints. This is to allow the shape drawing routine a place to end the
last line. We of course, want to make the last endpoint equal to the first
endpoint. This causes the routine to complete the shape.
lda $d018 ;show our work
eor #%00000010 ;
sta $d018 ;
Once we have finished drawing the shape in the buffer, we tell the VIC to
display the buffer. This technique is known as double buffering. We
display one buffer, while we do all our work in the other. When we are
done with the work, we display that buffer, and do all our work in the
first.
A nifty trick in situations like these is to use the VIC to our advantage.
Since we are using a character matrix to do our drawing in, we make both
of our buffers in sequential character matrix slots in memory. Then to
swap the displayed buffers, we merely toggle a bit in the VIC register
that tells the VIC which matrix to display. =)
A little confused? Well hopefully this will clear things up, at least a
little. We start our program with the VIC pointing at buffer#0 which is at
$2000. Bits 3,2,and 1 of $d018 control which address the VIC finds the
character matrix. Here is a nifty table showing the addresses which
correlate to those bits in $d018:
$d018 points to
----- ---------
%xxxx000x $0000
%xxxx001x $0800
%xxxx010x $1000
%xxxx011x $1800
%xxxx100x $2000 -> This is our Buffer#0
%xxxx101x $2800 -> This is our Buffer#1
%xxxx110x $3000
%xxxx111x $3800
You can see how we can make the VIC do all the work of displaying the
right buffer. All we have to do is toggle bit 1 of $d018 to make the VIC
switch between the character matrix at $2000 and the one at $2800.
lda $d018 ;clear the
and #%00000010 ; next buffer
beq ds4 ; for drawing
jsr blank0 ; in
jmp ds5 ;
ds4 jsr blank1 ;
This code checks which buffer is currently being displayed and the JSRs to
the routine to clear the OTHER one, in preparation for drawing in.
ds5 rts
dsy .byte $00
;---------------------------------------
Finish up... "dsy" is a temporary storage location.
getxy sty gxyy ;preserve .y
sta gxya ; .a
ldy gxytheta ;
lda mcos,y ;a
sec ;
sbc gxyradius ;-b
tay ;
lda msqr,y ;f(a-b)
sta gxytemp ;store result
ldy gxytheta ;
lda mcos,y ;a
clc ;
adc gxyradius ;+b
tay ;
lda msqr,y ;f(a+b)
sec ;
sbc gxytemp ;-f(a-b)
sta xpos ;=x coordinate
Ah, now you get to see the fast multiplication in action. This routine is
based heavily on the routine outlined by Steven Judd in C=Hacking. He has
already documented fairly well how the table of squares work, so I'll not
re-invent the wheel by explaining precisely how it works. What I will do
is step you through the steps of what this code does.
Okay, know that you can arrive at A*B with a function like:
f(A+B) - f(A-B) when f(x) = (x^2)/4.
I created a table of squares such that the offset from the beginning of
the table was the (x) in the f(x) above. For example, location $3300 + 0 =
0, since obviously (0^2)/4 is still 0. By the same token, $3300 + 9 = 20,
since (9^2)/4 = 20.25.
In our case, the radius component is the (A) and the (cos (theta))
component is the (B). Those who didn't sleep through trigonometry class
will recall that when converting from radius,theta into X,Y, your X
coordinate is R * cos(theta). Hence, our A*B routine.
Those that have read ahead, will note that I made some adjustments to the
tables to the effect that the sin and cos tables are multiplied by 64, and
that our table of squares is really the result of a function where f(x) =
(x^2)/4*64. The reason for this is to maintain some level of accuracy in
our tables. If you don't understand why this was done, I'll explain it
further when I detail the basic programs that build our tables.
ldy gxytheta ;
lda msin,y ;a
sec ;
sbc gxyradius ;-b
tay ;
lda msqr,y ;f(a-b)
sta gxytemp ;store result
ldy gxytheta ;
lda msin,y ;a
clc ;
adc gxyradius ;+b
tay ;
lda msqr,y ;f(a+b)
sec ;
sbc gxytemp ;-f(a-b)
sta ypos ;=y coordinate
ldy gxyy ;restore .y
lda gxya ;restore .a
rts
;----------------------------
gxyradius .byte $00
gxytheta .byte $00
xpos .byte $00
ypos .byte $00
gxyy .byte $00
gxya .byte $00
gxytemp .byte $00
;---------------------------------------
The code above does exactly the same thing the last piece of code did,
except this time we were calculating the Y coordinate, and of course, to
be trigonomically correct, our multiplication performs this equation:
Y = R * sin(theta).
The labels at the end are for temporary storage of values during the
multiplication routine.
plot pha ;preserve .a
lda $d018 ;plot in
and #%00000010 ; correct
bne plot2 ; buffer
plot1 lda lobyte,x ;lo byte
sta $02 ;
lda hibyte1,x ;hi byte
sta $03 ;
lda ($02),y ;
ora bitmask,x ;turn pixel on
sta ($02),y ;
pla ;restore .a
rts
;---------------------------------------
plot2 lda lobyte,x ;lo byte
sta $02 ;
lda hibyte0,x ;hi byte
sta $03 ;
lda ($02),y ;
ora bitmask,x ;turn pixel on
sta ($02),y ;
pla ;restore .a
rts
;---------------------------------------
Shoooom! There it is! If you blinked you may of missed it. The plot
routine is quite small, and its even smaller if your program doesn't need
to decide which buffer it needs to plot in.
In detail, first it checks which buffer we are working in and branches to
the appropriate plot routine.
How it works: the X register holds our X coordinate, and the Y register
holds our Y coordinate. Both our X and Y registers hold a value between 0
and 127. Knowing that, we can quickly and easily use some cleverly thought
out planning ahead to make plotting a pixel super fast and very clean.
Things we planned for ahead of time: We drew our character grid on the
screen such that the addresses that define the character data are laid out
to our advantage. Then we made a couple tables with data that makes it
extremely simple to look up the address of the byte where we want to plot
a pixel.
Sound a little vague? I'll explain in greater detail when we get to the
basic programs that set up the tables and also the little routine that
lays out our character grid.
First, it looks up an address based on our X coordinate. Each byte has 8
pixels so, our table looks like this:
$3300 $00 $00 $00 $00 $00 $00 $00 $00
$3308 $80 $80 $80 $80 $80 $80 $80 $80
$3310 $00 $00 $00 $00 $00 $00 $00 $00
$3318 $80 $80 $80 $80 $80 $80 $80 $80
...
Notice that the values change every eight locations? Each byte has eight
pixels. For example, imagine that the X coordinate is 0. We get the first
(0th) byte from our low byte table and see its a zero. Its easy to see
that we would always be plotting in the first (0th) byte until X is
greater than 7 (i.e.: 8) in which case the value in our low byte table is
$80. $80 is 128 bytes over from our first column of bytes, and holds the
byte that has pixels 8-15 in it. If you are still confused, I will explain
this further when we get to the routine that builds our character grid on
the screen.
Now that we have the low byte, we need a high byte. Again we look into a
cleverly planned table using X as our index. The high byte table looks
alot like the low byte table except that we change values every sixteen
bytes. Take a look:
$3380 $20 $20 $20 $20 $20 $20 $20 $20
$3388 $20 $20 $20 $20 $20 $20 $20 $20
$3390 $21 #21 $21 $21 $21 $21 $21 $21
$3398 $21 #21 $21 $21 $21 $21 $21 $21
...
This is because each column only has 128 pixels in it. If the X coordinate
is 0-7, then the byte that holds our pixel is at $2000. When the X
coordinate is 8-15, the byte is at $2080. Only when X is 16-23 does our
high byte change from $20 to, in this example, $21, since the bye we are
looking for would be $2100.
Now, we got an address based on the value of X. But what about Y? We
aren't always going to be plotting in the first top row of pixels! Since
we were clever in how we laid out our character grid, Y becomes an offset
from our address. We figured our how far to go across in bytes from a
table, but we don't need a table to figure out how far down to go. We
simply address the byte using indexed addressing. =) So, for example if
our Y coordinate was 5, the instruction LDA ($02),Y would get the 5th byte
down from the byte we arrived at earlier.
Nifty eh?
After we get the byte, we need to turn on a pixel. This is where the
bitmask table comes in. Our bitmask table looks like this:
$3480 $80 $40 $20 $10 $08 $04 $02 $01
$3488 $80 $40 $20 $10 $08 $04 $02 $01
$3490 $80 $40 $20 $10 $08 $04 $02 $01
$3498 $80 $40 $20 $10 $08 $04 $02 $01
...
It should be obvious how this table works. This table allows us to arrive
at the right pixel to turn on (via ORA) when we plot. Notice that it
repeats every eight bytes, and notice that there are eight pixels in a
byte. =)
drawline sec ;get dx
lda x1 ;
sbc x2 ;
sta dx ;
sec ;get dy
lda y1 ;
sbc y2 ;
sta dy ;
clc ;
lda dx ;
bpl drawline2 ;handle -dx
eor #%11111111 ;make dx
adc #%00000001 ; positive
sta dx ;
lda dy ;
bpl drawline1 ;handle -dy
eor #%11111111 ;make dy
adc #%00000001 ; positive
sta dy ;
lda dx
cmp dy
bcs dl00
jmp dl10
drawline1 lda dx
cmp dy
bcs dl20
jmp dl30
drawline2 lda dy
bpl drawline3
eor #%11111111
adc #%00000001
sta dy
lda dx
cmp dy
bcs drawline4 ;dl40
jmp dl50
drawline3 lda dx
cmp dy
bcs drawline5 ;dl50
jmp dl70
el rts
drawline4 jmp dl40
drawline5 jmp dl60
;---------------------------------------
The above code determines the slope of the line to be drawn. This is the
part of my code that I *know* in my gut I can improve on, but as of yet
have been unable to. There are eight separate little routines below that
draw a line. Each one is slightly different, they are written to handle
each of the eight possible types of line slopes that can be encountered
when drawing a line between two points.
The values DX and DY are calculated such that DX=X2-X1, and DY=Y2-Y1.
dl00 lda #$00 ;0 - dx
sbc dx ;
ldx x1 ;plot first
ldy y1 ; pixel
jsr plot ;
dl00a clc ;step in x
inx ; positive
adc dy ; until time
bcc dl00b ; to take a
iny ; positive
sbc dx ; step in y
dl00b jsr plot ;
cpx x2 ;
bne dl00a ;
rts
;----------------------------
All eight of these routines are basically the same with only two real
differences:
1) The value which is being stepped through until it's time to step the
other value.
2) The direction of the stepping
Here is a step-by-step description of what the above does:
1. Subtract DX (change in X) from zero
2. Plot the first pixel
3. Step upwards in values of X, until it is time to take a step in Y
4. Plot the pixel
5. Loop back until we've reached the second endpoint
Each of the following routines works exactly the same way.
dl10 lda #$00
sbc dy
ldx x1
ldy y1
jsr plot
dl10a clc ;step +y
iny ; until need
adc dx ; to +x
bcc dl10b ;
inx ;
sbc dy ;
dl10b jsr plot ;
cpy y2 ;
bne dl10a ;
rts
;----------------------------
dl20 lda #$00
sbc dx
ldx x1
ldy y1
jsr plot
dl20a clc ;step +x
inx ; until need
adc dy ; to -y
bcc dl20b ;
dey ;
sbc dx ;
dl20b jsr plot ;
cpx x2 ;
bne dl20a ;
rts
;----------------------------
dl30 lda #$00
sbc dy
ldx x1
ldy y1
jsr plot
dl30a clc ;step -y
dey ; until need
adc dx ; to +x
bcc dl30b ;
inx ;
sbc dy ;
dl30b jsr plot ;
cpy y2 ;
bne dl30a ;
rts
;----------------------------
dl40 lda #$00
sbc dx
ldx x1
ldy y1
jsr plot
dl40a clc ;step -x
dex ; until need
adc dy ; to +y
bcc dl40b ;
iny ;
sbc dx ;
dl40b jsr plot ;
cpx x2 ;
bne dl40a ;
rts
;----------------------------
dl50 lda #$00
sbc dy
ldx x1
ldy y1
jsr plot
dl50a clc ;step +y
iny ; until need
adc dx ; to -x
bcc dl50b ;
dex ;
sbc dy ;
dl50b jsr plot ;
cpy y2 ;
bne dl50a ;
rts
;----------------------------
dl60 lda #$00
sbc dx
ldx x1
ldy y1
jsr plot
dl60a clc ;step -x
dex ; until need
adc dy ; to -y
bcc dl60b ;
dey ;
sbc dx ;
dl60b jsr plot ;
cpx x2 ;
bne dl60a ;
rts
;----------------------------
dl70 lda #$00
sbc dy
ldx x1
ldy y1
jsr plot
dl70a clc ;step -y
dey ; until need
adc dx ; to -x
bcc dl70b ;
dex ;
sbc dy ;
dl70b jsr plot ;
cpy y2 ;
bne dl70a ;
rts
;---------------------------------------
x1 .byte $00
y1 .byte $00
x2 .byte $00
y2 .byte $00
dx .byte $00
dy .byte $00
;---------------------------------------
The values above are temporary holding places for the values used and
generated by the drawline routines.
demoinit lda #$06 ;dark blue
sta $d020 ; background
sta $d021 ; and border
lda #147 ;clear screen
jsr $ffd2 ;
lda #$7f ;disable cia
sta $dc0d ; time irqs
jsr blank0 ;clear both
jsr blank1 ; buffers
lda #$00
sta char
ldy #$00 ;dirty way to
demoinit4 lda #scrloc ; matrix on
sta $fb ; the screen
ldx #$00 ;
demoinit5 lda char ; the top
sta ($fa),y ; left corner
inc char ; is the
lda $fa ; value in
clc ; scrloc
adc #40 ;
sta $fa ;
bcc demoinit6 ;
inc $fb ;
demoinit6 inx ;
cpx #16 ;
bne demoinit5 ;
iny ;
cpy #16 ;
bne demoinit4 ;
ldy #$00 ;fill color
lda #$01 ; memory with
demoinit10 sta $d800,y ; white
sta $d900,y ;
sta $da00,y ;
sta $db00,y ;
iny ;
bne demoinit10 ;
lda $d018 ;point VIC to
and #%11110000 ; first
ora #%00001000 ; buffer
sta $d018 ;
jsr initobj00 ;1st shape
lda #$00 ;init
sta angleinc ; variables
sta curshape ;
rts
char .byte $00
;---------------------------------------
The above initialization routine sets everything up for the demo. In order
of appearance, here is an explanation of what each part of this
initialization does.
1. Turn the border and background dark blue.
2. Clear the screen
3. Disable the CIA timer IRQs
4. Clear both of the drawing buffers
5. Draw our character grid on the screen
6. Fill Color RAM with WHITE
7. Point the VIC to the first buffer (Buffer#0)
8. Initialize the first shape
9. Initialize the "angleinc" and "curshape" variables.
Drawing the character grid on the screen could possibly of been done a
little better. But as it says in the comments, its quick and dirty, but it
does the job. =)
Using our brains a bit, we draw the character grid on the screen in such a
way that starting with zero (the @ sign) we place sequential characters on
the screen in columns, 16 down, by 16 across. This enables us to use a
very nifty and fast plotter as described above. Laying out our character
grid in this manner, gives us 128 sequential bytes in the first column,
128 sequential bytes in the second, and so on.
You can make character grids of any size (up to 16*16) in this manner very
easily. A smaller grid frees up characters to be used for other things,
like say a niftycool border or other graphics to be used along side of the
character grid where your vectors are being drawn. You just have to adjust
your low byte and high byte tables accordingly.
blank0 ldy #$00
lda #$00
blank0a sta $2000,y
sta $2080,y
sta $2100,y
sta $2180,y
sta $2200,y
sta $2280,y
sta $2300,y
sta $2380,y
sta $2400,y
sta $2480,y
sta $2500,y
sta $2580,y
sta $2600,y
sta $2680,y
sta $2700,y
sta $2780,y
iny
bpl blank0a
rts
;---------------------------------------
Clears the first buffer (buffer#0). Unfortunately, these clear routines
are real cycle hogs. There just aren't many fast ways to zero out 4k of
memory.
blank1 ldy #$00
lda #$00
blank1a sta $2800,y
sta $2880,y
sta $2900,y
sta $2980,y
sta $2a00,y
sta $2a80,y
sta $2b00,y
sta $2b80,y
sta $2c00,y
sta $2c80,y
sta $2d00,y
sta $2d80,y
sta $2e00,y
sta $2e80,y
sta $2f00,y
sta $2f80,y
iny
bpl blank1a
rts
;---------------------------------------
The above clears the second buffer (buffer#1)...
initobj00 lda #$04
sta points
ldy #$00
initobj01 lda theta00,y
sta itheta,y
sta theta,y
lda radiu00,y
sta iradiu,y
sta radiu,y
iny
cpy points
bne initobj01
rts
;---------------------------------------
initobj10 lda #$05
sta points
ldy #$00
initobj11 lda theta10,y
sta itheta,y
sta theta,y
lda radiu10,y
sta iradiu,y
sta radiu,y
iny
cpy points
bne initobj11
rts
;---------------------------------------
initobj20 lda #$09
sta points
ldy #$00
initobj21 lda theta20,y
sta itheta,y
sta theta,y
lda radiu20,y
sta iradiu,y
sta radiu,y
iny
cpy points
bne initobj21
rts
;---------------------------------------
theta00 .byte 0,86,172,0
radiu00 .byte 40,40,40,40
theta10 .byte 0,64,192,128,0
radiu10 .byte 40,40,40,40,40
theta20 .byte 0,96,192,32,128
.byte 224,64,160,0
radiu20 .byte 40,40,40,40,40
.byte 40,40,40,40
The above short routines are the three that define the three shapes I
included in this demo. The initialization for a shape is very simple. You
store the number of endpoints in the "points" variable, and then copy the
information from the appropriate table into the "theta" and "radiu"
tables. I also copy the values into the "itheta" and "iradiu" tables in
case there is a routine that needs to reference the initial values rather
than the current values.
Also note, that as mentioned before, the last end point is the same as the
first endpoint. This closes off the shape by drawing a line back to the
first endpoint.
Well, there you have it! Documented source code for a demonstration of a
different way to play with two-dimensional vectors. What follows are the
basic programs used to create the data tables used by the above program,
and of course, the uuencoded programs themselves.
The BASIC Programs
------------------
The first program is the program which builds the sine, cosine and squares
tables.
1 rem -+- make sin/cos/sqr 2.0 -+-
2 rem -+- -+-
3 rem -+- by: waveform -+-
4 rem -+- for: disC=overy magazine -+-
5 rem -+- on: 09-14-96 -+-
10 rem ::: make sin and cos tables :::
12 ba=12288:bh=int(ba/256):bl=ba-bh*256
"ba" is the starting address for the data created by this program.
14 forby=0to255
16 de=by*1.407:ra=de*(pi/180)
18 s=int((sin(ra)*64)+.5)
20 ifs<0thens=255-abs(s)+1
22 c=int((cos(ra)*64)+.5)
24 ifc<0thenc=255-abs(c)+1
26 poke ba + by,s:poke ba + 256 + by,c
28 next
The above loop makes both the sine table and the cosine table. Also please
note that in place of the word "pi" in line 16 above, you should instead
use Commodore Basic's pi symbol.
Step-by-step, the loop does this:
1. Convert from byts to degrees.
2. Convert from degrees to radians. This is necessary since Commodore's
Basic trigonomic functions work with radians.
3. Calculate the sine of the angle.
4. If the sine is less than 0 (i.e.. negative) adjust the number.
5. Calculate the cosine of the angle
6. If the cosine is negative, adjust the number.
7. Store both the sine and cosine in their tables.
8. Loop back until all 256 byts have been calculated and stored in a
table.
When we calculate the sine or cosine of an angle, it becomes quickly
evident that except when the result is 0 or 1, the result is always a
decimal number. Our Commodores don't have a quick and easy way to store a
decimal number, certainly not in one byte.
Rather than mess with program code to deal with decimal numbers, we can
resolve this issue rather quickly by some planning ahead here when we
create our tables.
If we take a decimal number, for instance .707107 (the sine of 45 degrees,
or 32 byts) and multiply that number by a constant, in our case 64, we can
arrive at a number which is much easier for our computers to store: 45.25.
The fractional part of the number is chopped off when we store the value
to memory, but what remains is a value that we can work with.
Incidentally, we add the .5 to the value to reduce the rounding errors
caused when the fractional portion of the number is chopped off.
What is this about "adjusting" the number, anyway? Well, we know how to
store a decimal number now, but what about a number that's negative?
One of the nifty things about the math instructions on a Commodore is that
they work the same for both unsigned and signed arithmetic. But to store a
negative number you have to use twos compliment. To arrive at the twos
compliment (negative) number, you flip all of the bits, and add one.
Commodore basic doesn't have an EOR instruction, so we do the next best
thing: subtract the number from 255. Subtracting a number from 255 has the
same effect as EORing a number by %11111111, which would, of course, flip
all of the bits. We then add one, and POOF, the twos complement of a
number.
The niftiness about twos compliment and signed values in assembly, is that
it gives us a way to represent negative numbers. When you ADD or SBC with
these numbers, the instructions work just as they did before, but the
perform the task you'd expect by using a negative number.
Steven Judd talks a bit about this in the same articles for C=Hacking that
he wrote detailing how the table of squares works for fast multiplication.
Also, you can read more about this in virtually any book or reference work
that talks about the 65xx line of microprocessors.
Having mentioned that, how do we arrive at the right answer later on if
one of our values has been multiplied by a factor of 64? Read on...
50 rem ::: make sqr table :::
52 forby=0to127
54 sq=(by*by)/(4*64)
56 poke ba + 512 +by,sq:next
60 forby=0to127
62 sq=(by*by)/(4*64)
64 poke ba + 512 +255-by,sq:next
The above loop creates a table of squares. I did a little planning ahead
as well, and realized that the table need only be 128 bytes long. Recall
our niftycool formula for fast multiplication: f(a+b) - f(a-b) = a*b.
When we do the first part of the formula (f(a+b)) you can see from how our
program works that the radius is never larger than 64, and at the most,
the value from our sine table will be is 64. Remember from trig class that
sines and cosines range from 0 to 1. =) Since we multiplied our sine and
cosine values by 64, the largest value we could ever get is 64.
Well, you say, but since the values you pull from your sine and cosine
table can be negative, what happens if the result of A+B or A-B is
negative? The result of a squaring will always be positive, so we don't
have to store any numbers using twos compliment, but we will end up with
situations where the sum or the difference of A and B will be negative. We
resolve this by building a second table of squares above the first, and we
build it downward in memory.
Hopefully, this example will clear up any confusion:
If A (our radius) is 20, and the sine of our angle is -40, the sum of A
and B is -20. The value in the register is 236, which means -20 in two
complement. Since we built another table above the first 128 byte table of
squares, we are free and clear. Our foresight put the correct value for
f(-20) in that location.
Now, there is one more issue to deal with. We multiplied all of our sines
and cosines by a factor of 64. Somewhere we have to divide by 64 to keep
the equation equal (and to not freak out my 7th grade algebra instructor)
so we take a look at the function we have for f(x):
f(x)=(x^2)/4
We realize we can do that *divide-by-64* thing here, to arrive at:
f(x)=(x^2)/4/64
Or, written another way:
f(x)=(x^2)/4*64 which is also f(x)=(x^2)/128
I left it as 4*64 in the program for clarity sake.
100 rem ::: save to disk :::
102 open1,8,15,"s0:sin/cos/sqr":close1
104 open2,8,2,"sin/cos/sqr,p,w"
106 print#2,chr$(bl)chr$(bh);
108 fort=0to767
110 print#2,chr$(peek(ba+t));
112 next
114 close2
The above simply saves the table we created to disk, with a loading
address of $3000. =) Just what the demo needs! =)
Now, on to the second program...
The second program is the program to build the low byte, high byte and
bitmask tables.
1 rem -+- make base/mask table 2.0 -+-
2 rem -+- -+-
3 rem -+- by: waveform -+-
4 rem -+- for: disC=overy magazine -+-
5 rem -+- on: 09-14-96 -+-
10 ba = 13056 : cb=8192
"ba" is the starting address for the data created by this program.
"cb" is the base address for the character matrix used by our demo.
12 bh=int(ba/256):bl=ba-bh*256
100 c=0:fort=0 to 15
102 forq=0to7
104 b1=cb+(t*128)
106 hb=int(b1/256):lb=b1-hb*256
108 poke ba + c,lb:poke ba + 128 + c,hb
110 c=c+1:nextq,t
The above loop creates the low byte table as well as the high byte table
for the first buffer (buffer#0)
120 c=0:fort=0 to 15
122 forq=0to7
124 b1=cb+2048+(t*128)
126 hb=int(b1/256):lb=b1-hb*256
128 poke ba +256 +c,hb
130 c=c+1:nextq,t
The above loop creates the high byte table for the second buffer
(buffer#1)
150 fort=0 to 15
152 forq=0to7
154 poke ba + 384+(t*8)+q,2 (7-q)
156 nextq,t
The above loop creates the bitmask table.
200 open1,8,15,"s0:base/mask":close1
202 open2,8,2,"base/mask,p,w"
204 print#2,chr$(bl)chr$(bh);
206 fort=0to511
208 print#2,chr$(peek(ba+t));
210 next
212 close2
Here we have a small routine to save the tables created by this program to
disk, with a loading address of $3300.
UUEncoded Files
---------------
Below you will find the uuencoded files detailed in this article. I
included TurboAssmembler source, Object code (sys 2080 to run), both
tables needed by the program to run, as well as the basic programs used to
create them.
I will make all of these available in a zip package on my web site:
http://marie.az.com/~waveform
How To Get The Demo To Run
--------------------------
There wasn't enough time to write a loader for this demo before the
deadline, and even though my gracious host allowed me the opportunity to
add one, it is really more trouble than its worth.
So for those of you explorers who want to know how things are really
done, here's a quick and simple method of launching this demo:
Step One: load "sin_cos_sqr",8,1
Step Two: load "base_mask",8,1
Step Three: load "disc-demo.obj",8,1
Step Four: sys2080
[Ed. Note : To prevent possible compatibility problems with unix file-handling,
the UUencoded files in this article will extract with filenames as seen
in the loading sequence above. If you generate new tables with the basic
table generators, then the new tables will be named to their original
designations (i.e., sin/cos/sqr instead of sin_cos_sqr). Please remember
to change filenames in your loading procedure to match.]
Demo-Controls
-------------
F1 : increments angle (apparent speed)
F3 : decrements angle
F5 : expands shape
F7 : shrinks shape
: changes shape
: resets to default size and angle
: quits the demo
--
For more information or general commentary on this article, Mr. John Kaiser
(Waveform/MLM) may be reached at waveform@az.com
Addenum by S. Judd, Technical Editor
------+------^-------^------+-------
John shrewdly omitted a few "tricks" in order for the code to flow more
clearly as a learning experience. Once the process is well-understood, a
bit of sneaky optimization can be undertaken, as follows :
[...]
> ;---------------------------------------
> expdn ldy #$00
> expdn1 lda radiu,y
> sec
> sbc #$01 ;subtract one
> cmp #2 ;at minimum?
> bcs expdn2
> lda #2
> expdn2 sta radiu,y ;store
> iny
> cpy points ;do all points
> bne expdn1
This can be written more efficiently, as follows :
expdn LDX POINTS ;Start at top
:L1 LDA RADIU,X
CMP #3
BCC :SKIP
DEC RADIU,X
:SKIP DEX
BPL :L1 ;Only allows 128 points though
It is always better to start Y large and count downwards when you can.
His loop takes 4+2+3+2+3/(2+2)+4+2+3 = 23/24 cycles per iteration, while
the rewrite takes 4+2+3/(2+7)+2 = 11/17 cycles per iteration, and a few
less bytes too. Here it is not such a huge deal -- 6 cycles savings
per loop for the one usually used -- but this kind of trick can sometimes
lead to immense savings.
[...]
> rotate ldy #$00
In 'sneaky mode' we could start at Y points and go downwards here as well.
> rotate1 lda theta,y ;get theta
> clc
> adc angleinc ;add amount
> sta theta,y ;store theta
> iny
> cpy points ;do all points
> bne rotate1
> rts
[...]
> ;---------------------------------------
> plot2 lda lobyte,x ;lo byte
> sta $02 ;
> lda hibyte0,x ;hi byte
> sta $03 ;
>
> lda ($02),y ;
> ora bitmask,x ;turn pixel on
> sta ($02),y ;
>
> pla ;restore .a
> rts
> ;---------------------------------------
Please note :
1- Using a JSR plot each time adds 12 cycles (JSR + RTS)
2- Most of the time you plot within the same byte.
That is, reloading $02/$03 each time is redundant by a
factor of 8 at the very least. (The x-coordinate is
not going to change by more than 1 at each iteration!)
Every cycle saved in a line drawing routine produces huge dividends. If
you have an object with just three lines in it, and each line has 100
points in it, you suddenly start saving thousands of cycles, (i.e., using
a JSR adds 3600 cycles immediately). Those 14 cycles in loading the
coordinate each time translates to many extra thousand cycles too.
In raster time, we're talking well over half the screen here!
In general, if it's in a loop, you can't optimize it enough :).
Again, though, doing things this way makes things much clearer (which makes
life easier on the programmer too).
--
begin 644 disc-demo.sou
M.RTM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0T[("#!
M(,1%34\@3T8@Q$E30SU/5D5262`M(,)9.B#7059%1D]230T[#3L@("A#*2`Q
M.3DV($9/4B#$25-#/4]615)9(,U!1T%:24Y%#3LBU$A%(,I/55).04P@1D]2
M(,-/34U/1$]212#%3E1(55-)05-44R(-.RTM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+0T@("`@("`@("`@("H]("0P.#(P#0U03TE.
M5%,@("`@(#T@)#`S,T,@("`[3E5-0D52($]&(%!/24Y44PU!3D=,14E.0R`@
M(#T@)#`S,T0@("`[04Y'3$4@5$\@24Y#4D5-14Y4#4-54E-(05!%("`@/2`D
M,#,S12`@(#M#55)214Y4(%-(05!%#51(151!("`@("`@/2`D,#,T,"`@(#M4
M2$5405,-4D%$254@("`@("`]("0P,S4P("`@.U)!1$E54PU)5$A%5$$@("`@
M(#T@)#`S-C`@("`[5$%"($]&($E.250@5$A%5$%3#4E2041)52`@("`@/2`D
M,#,W,"`@(#L@("`@("`@24Y)5"!2041)55,-#5-#4DQ/0R`@("`@/2`Q,3DV
M("`@(#M,3T-!5$E/3B!/1B!43U`-("`@("`@("`@("`@("`@("`@("`@.R!,
M1494($-/4DY%4B!/1@T@("`@("`@("`@("`@("`@("`@("`[($-(05(@34%4
M4DE8($].#2`@("`@("`@("`@("`@("`@("`@(#L@4T-2145.#0U-4TE.("`@
M("`@(#T@)#,P,#`@("`[5$%"3$4@3T8@4TE.15,-34-/4R`@("`@("`]("0S
M,3`P("`@.R`@("`@("`@($-/4TE.15,-35-14B`@("`@("`]("0S,C`P("`@
M.R`@("`@("`@(%-154%215,-3$]"651%("`@("`]("0S,S`P("`@.TQ/($)9
M5$53#4A)0EE413`@("`@/2`D,S,X,"`@(#M(22!"651%4R`H)#(P,#`K*0U(
M24)95$4Q("`@(#T@)#,T,#`@("`[("`@("`@("`@*"0R.#`P*RD-0DE434%3
M2R`@("`]("0S-#@P("`@.U!)6$5,($))5"!-05-+#3LM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2T-("`@("`@("`@("!*4U(@1$5-
M3TE.250@("`[24Y)5"!&3U(@1$5-3PT@("`@("`@("`@(#LM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM#4U!24X@("`@("`@3$1!(",E,3$Q,3$Q,3`@
M.U-#04X@34%44DE8#2`@("`@("`@("`@4U1!("1$0S`P("`@("`@.PT@("`@
M("`@("`@($Q$02`D1$,P,2`@("`@(#L-#2`@("`@("`@("`@0TU0(",E,3$Q
M,#$Q,3$@.T8Q#2`@("`@("`@("`@0D51($M%648Q#2`@("`@("`@("`@0TU0
M(",E,3$P,3$Q,3$@.T8S#2`@("`@("`@("`@0D51($M%648S#2`@("`@("`@
M("`@0TU0(",E,3`Q,3$Q,3$@.T8U#2`@("`@("`@("`@0D51($M%648U#2`@
M("`@("`@("`@0TU0(",E,3$Q,3`Q,3$@.T8W#2`@("`@("`@("`@0D51($M%
M648W#2`@("`@("`@("`@0TU0(",E,3$Q,3$Q,#$@.U)%5%523@T@("`@("`@
M("`@($)%42!+15E215154DX-#2`@("`@("`@("`@3$1!(",E,#$Q,3$Q,3$@
M.U-#04X@34%44DE8#2`@("`@("`@("`@4U1!("1$0S`P("`@("`@.PT@("`@
M("`@("`@($Q$02`D1$,P,2`@("`@(#L-#2`@("`@("`@("`@0TU0(",E,3$Q
M,#$Q,3$@.U-004-%#2`@("`@("`@("`@0D51($M%65-004-%#2`@("`@("`@
M("`@0TU0(",E,#$Q,3$Q,3$@.U)%5%523@T@("`@("`@("`@($)%42!+15E3
M5$]0#0U2145.5%)9("`@($I34B!23U1!5$4@("`@(#M23U1!5$4@4TA!4$4-
M("`@("`@("`@("!*4U(@1%)!5U-(05!%("`[1%)!5R!32$%010T-("`@("`@
M("`@("!*35`@34%)3B`@("`@("`[3$]/4`T[+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM#4M%648Q("`@("`@2DU0(%)/5%50#4M%
M648S("`@("`@2DU0(%)/5$1.#4M%648U("`@("`@2DU0($584%50#4M%648W
M("`@("`@2DU0($584$1.#4M%65)%5%523B`@2DU0($Y%5U-(05!%#4M%65-0
M04-%("`@2DU0(%-43U!23U0-2T594U1/4"`@("!*35`@4U1/4$1%34\-.RTM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0U23U154"`@
M("`@($E.0R!!3D=,14E.0R`@(#M)3D-214%310T@("`@("`@("`@($I-4"!2
M145.5%)9#3LM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2T-4D]41$X@("`@("!$14,@04Y'3$5)3D,@("`[1$5#4D5!4T4-("`@("`@
M("`@("!*35`@4D5%3E1260T[+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM#4584%50("`@("`@3$19(",D,#`-15A055`Q("`@("!,
M1$$@4D%$254L60T@("`@("`@("`@($-,0PT@("`@("`@("`@($%$0R`C)#`Q
M("`@("`@(#M!1$0@3TY%#2`@("`@("`@("`@0TU0(",V,R`@("`@("`@.T%4
M($U!6$E-54T_#2`@("`@("`@("`@0D-#($584%50,@T@("`@("`@("`@($Q$
M02`C-C,-15A055`R("`@("!35$$@4D%$254L62`@("`[4U1/4D4-("`@("`@
M("`@("!)3ED-("`@("`@("`@("!#4%D@4$])3E13("`@("`[1$\@04Q,(%!/
M24Y44PT@("`@("`@("`@($).12!%6%!54#$-#2`@("`@("`@("`@2DU0(%)%
M14Y44ED@("`@.U)%5%523@T[+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM#4584$1.("`@("`@3$19(",D,#`-15A01$XQ("`@("!,
M1$$@4D%$254L60T@("`@("`@("`@(%-%0PT@("`@("`@("`@(%-"0R`C)#`Q
M("`@("`@(#M354)44D%#5"!/3D4-("`@("`@("`@("!#35`@(S(@("`@("`@
M("`[050@34E.24U533\-("`@("`@("`@("!"0U,@15A01$XR#2`@("`@("`@
M("`@3$1!(",R#4584$1.,B`@("`@4U1!(%)!1$E5+%D@("`@.U-43U)%#2`@
M("`@("`@("`@24Y9#2`@("`@("`@("`@0U!9(%!/24Y44R`@("`@.T1/($%,
M3"!03TE.5%,-("`@("`@("`@("!"3D4@15A01$XQ#0T@("`@("`@("`@($I-
M4"!2145.5%)9("`@(#M215154DX-.RTM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+0U.15=32$%012`@($E.0R!#55)32$%010T@("`@
M("`@("`@($Q$02!#55)32$%010T@("`@("`@("`@($--4"`C,R`@("`@("`@
M(#M,05-4(%-(05!%/PT@("`@("`@("`@($).12!.15=32$%013$@(#M.3U!%
M#2`@("`@("`@("`@3$1!(",D,#`@("`@("`@.U-%3$5#5"!&25)35`T@("`@
M("`@("`@(%-402!#55)32$%012`@(#L@4TA!4$4-#4Y%5U-(05!%,2`@0TU0
M(",R("`@("`@("`@.S@@4$])3E0@4U1!4C\-("`@("`@("`@("!"3D4@3D57
M4TA!4$4R#2`@("`@("`@("`@2E-2($E.251/0DHR,"`@.TE.250@4TA!4$4-
M("`@("`@("`@("!*35`@4D5%3E1260U.15=32$%013(@($--4"`C,2`@("`@
M("`@(#LR(%1224%.1TQ%4S\-("`@("`@("`@("!"3D4@3D574TA!4$4S#2`@
M("`@("`@("`@2E-2($E.251/0DHQ,"`@.TE.250@4TA!4$4-("`@("`@("`@
M("!*35`@4D5%3E1260U.15=32$%013,@($I34B!)3DE43T)*,#`@(#LQ(%12
M24%.1TQ%#0T@("`@("`@("`@($I-4"!2145.5%)9#3LM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2T-4U1/4%)/5"`@("!,1$$@(R0P
M,"`@("`@("`[4U1/4"!!3$P-("`@("`@("`@("!35$$@04Y'3$5)3D,@("`[
M(%)/5$%424].#2`@("`@("`@("`@2DU0(%)%14Y44ED@("`@.PT[+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM#5-43U!$14U/("`@
M3$1!(",D.#$@("`@("`@.U)%4T54($-)00T@("`@("`@("`@(%-402`D1$,P
M1"`@("`@(#L@24Y415)255!44PT-("`@("`@("`@("!*35`@)$9%-C8@("`@
M("`[15A)5"!624$-("`@("`@("`@("`@("`@("`@("`@("`@("`[($M%4DY!
M3"!705)-#2`@("`@("`@("`@("`@("`@("`@("`@("`@.R!35$%25`T[+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM#5)/5$%412`@
M("`@3$19(",D,#`-4D]4051%,2`@("!,1$$@5$A%5$$L62`@("`[1T54(%1(
M151!#2`@("`@("`@("`@0TQ##2`@("`@("`@("`@041#($%.1TQ%24Y#("`@
M.T%$1"!!34]53E0-("`@("`@("`@("!35$$@5$A%5$$L62`@("`[4U1/4D4@
M5$A%5$$-("`@("`@("`@("!)3ED-("`@("`@("`@("!#4%D@4$])3E13("`@
M("`[1$\@04Q,(%!/24Y44PT@("`@("`@("`@($).12!23U1!5$4Q#0T@("`@
M("`@("`@(%)44PT[+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM#41205=32$%012`@3$19(",D,#`@("`@("`@.T9)4E-4(%!/24Y4
M#0T@("`@("`@("`@($Q$02!42$5402Q9("`@(#M005-3(%1(151!("8-("`@
M("`@("`@("!35$$@1UA95$A%5$$@("`[(%)!1$E54R!43PT@("`@("`@("`@
M($Q$02!2041)52Q9("`@(#L@1T546%D@1D]2#2`@("`@("`@("`@4U1!($=8
M65)!1$E54R`@.R!#3TY615)324].#2`@("`@("`@("`@2E-2($=%5%A9("`@
M("`@.PT-("`@("`@("`@("!,1$$@6%!/4R`@("`@("`[0T].5D525"!03TQ!
M4@T@("`@("`@("`@($-,0R`@("`@("`@("`@(#L@5$\-("`@("`@("`@("!!
M1$,@(S8T("`@("`@("`[($-!4E1%4TE!3@T@("`@("`@("`@(%-402!8,2`@
M("`@("`@(#L-("`@("`@("`@("!,1$$@65!/4R`@("`@("`[#2`@("`@("`@
M("`@0TQ#("`@("`@("`@("`@.PT@("`@("`@("`@($%$0R`C-C0@("`@("`@
M(#L-("`@("`@("`@("!35$$@63$@("`@("`@("`[#0T@("`@("`@("`@($Q$
M62`C)#`Q("`@("`@(#M314-/3D0@4$])3E0-1%,R("`@("`@("!,1$$@5$A%
M5$$L60T@("`@("`@("`@(%-402!'6%E42$5400T@("`@("`@("`@($Q$02!2
M041)52Q9#2`@("`@("`@("`@4U1!($=865)!1$E54PT@("`@("`@("`@($I3
M4B!'151860T@("`@("`@("`@($Q$02!84$]3#2`@("`@("`@("`@0TQ##2`@
M("`@("`@("`@041#(",V-`T@("`@("`@("`@(%-402!8,@T@("`@("`@("`@
M($Q$02!94$]3#2`@("`@("`@("`@0TQ##2`@("`@("`@("`@041#(",V-`T@
M("`@("`@("`@(%-402!9,@T@("`@("`@("`@(%-462!$4UD@("`@("`@(#M0
M4D5315)612`N60T@("`@("`@("`@($I34B!$4D%73$E.12`@(#M$4D%7($$@
M3$E.10T@("`@("`@("`@($Q$62!$4UD@("`@("`@(#M215-43U)%("Y9#0T@
M("`@("`@("`@($Q$02!8,B`@("`@("`@(#M-04M%($5.1%!/24Y4#2`@("`@
M("`@("`@4U1!(%@Q("`@("`@("`@.R!/1B!42$E3($Q)3D4-("`@("`@("`@
M("!,1$$@63(@("`@("`@("`[(%-405)4(%!/24Y4#2`@("`@("`@("`@4U1!
M(%DQ("`@("`@("`@.R!/1B!.15A4($Q)3D4-#2`@("`@("`@("`@24Y9("`@
M("`@("`@("`@.T1/($%,3"!03TE.5%,-("`@("`@("`@("!#4%D@4$])3E13
M("`@("`[#2`@("`@("`@("`@0DY%($13,B`@("`@("`@.PT-("`@("`@("`@
M("!,1$$@)$0P,3@@("`@("`[4TA/5R!/55(@5T]22PT@("`@("`@("`@($5/
M4B`C)3`P,#`P,#$P(#L-("`@("`@("`@("!35$$@)$0P,3@@("`@("`[#0T@
M("`@("`@("`@($Q$02`D1#`Q."`@("`@(#M#3$5!4B!42$4-("`@("`@("`@
M("!!3D0@(R4P,#`P,#`Q,"`[($Y%6%0@0E5&1D52#2`@("`@("`@("`@0D51
M($13-"`@("`@("`@.R!&3U(@1%)!5TE.1PT@("`@("`@("`@($I34B!"3$%.
M2S`@("`@(#L@24X-("`@("`@("`@("!*35`@1%,U("`@("`@("`[#413-"`@
M("`@("`@2E-2($),04Y+,2`@("`@.PT-1%,U("`@("`@("!25%,-#41362`@
M("`@("`@+D)95$4@)#`P#3LM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2T-1T546%D@("`@("!35%D@1UA962`@("`@("`[4%)%4T52
M5D4@+ED-("`@("`@("`@("!35$$@1UA902`@("`@("`[("`@("`@("`@+D$-
M#2`@("`@("`@("`@3$19($=8651(151!("`@.PT@("`@("`@("`@($Q$02!-
M0T]3+%D@("`@(#M!#2`@("`@("`@("`@4T5#("`@("`@("`@("`@.PT@("`@
M("`@("`@(%-"0R!'6%E2041)55,@(#LM0@T@("`@("`@("`@(%1!62`@("`@
M("`@("`@(#L-("`@("`@("`@("!,1$$@35-14BQ9("`@("`[1BA!+4(I#2`@
M("`@("`@("`@4U1!($=8651%35`@("`@.U-43U)%(%)%4U5,5`T-("`@("`@
M("`@("!,1%D@1UA95$A%5$$@("`[#2`@("`@("`@("`@3$1!($U#3U,L62`@
M("`@.T$-("`@("`@("`@("!#3$,@("`@("`@("`@("`[#2`@("`@("`@("`@
M041#($=865)!1$E54R`@.RM"#2`@("`@("`@("`@5$%9("`@("`@("`@("`@
M.PT@("`@("`@("`@($Q$02!-4U%2+%D@("`@(#M&*$$K0BD-("`@("`@("`@
M("!314,@("`@("`@("`@("`[#2`@("`@("`@("`@4T)#($=8651%35`@("`@
M.RU&*$$M0BD-("`@("`@("`@("!35$$@6%!/4R`@("`@("`[/5@@0T]/4D1)
M3D%410T-("`@("`@("`@("!,1%D@1UA95$A%5$$@("`[#2`@("`@("`@("`@
M3$1!($U324XL62`@("`@.T$-("`@("`@("`@("!314,@("`@("`@("`@("`[
M#2`@("`@("`@("`@4T)#($=865)!1$E54R`@.RU"#2`@("`@("`@("`@5$%9
M("`@("`@("`@("`@.PT@("`@("`@("`@($Q$02!-4U%2+%D@("`@(#M&*$$M
M0BD-("`@("`@("`@("!35$$@1UA95$5-4"`@("`[4U1/4D4@4D5354Q4#0T@
M("`@("`@("`@($Q$62!'6%E42$5402`@(#L-("`@("`@("`@("!,1$$@35-)
M3BQ9("`@("`[00T@("`@("`@("`@($-,0R`@("`@("`@("`@(#L-("`@("`@
M("`@("!!1$,@1UA94D%$2553("`[*T(-("`@("`@("`@("!405D@("`@("`@
M("`@("`[#2`@("`@("`@("`@3$1!($U345(L62`@("`@.T8H02M"*0T@("`@
M("`@("`@(%-%0R`@("`@("`@("`@(#L-("`@("`@("`@("!30D,@1UA95$5-
M4"`@("`[+48H02U"*0T@("`@("`@("`@(%-402!94$]3("`@("`@(#L]62!#
M3T]21$E.051%#0T@("`@("`@("`@($Q$62!'6%E9("`@("`@(#M215-43U)%
M("Y9#2`@("`@("`@("`@3$1!($=864$@("`@("`@.U)%4U1/4D4@+D$-#2`@
M("`@("`@("`@4E13#2`@("`@("`@("`@.RTM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2T-1UA94D%$2553("`N0EE412`D,#`-1UA95$A%5$$@("`N0EE4
M12`D,#`-#5A03U,@("`@("`@+D)95$4@)#`P#5E03U,@("`@("`@+D)95$4@
M)#`P#0U'6%E9("`@("`@("Y"651%("0P,`U'6%E!("`@("`@("Y"651%("0P
M,`U'6%E414U0("`@("Y"651%("0P,`T[+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM#5!,3U0@("`@("`@4$A!("`@("`@("`@("`@
M.U!215-%4E9%("Y!#0T@("`@("`@("`@($Q$02`D1#`Q."`@("`@(#M03$]4
M($E.#2`@("`@("`@("`@04Y$(",E,#`P,#`P,3`@.R!#3U)214-4#2`@("`@
M("`@("`@0DY%(%!,3U0R("`@("`@.R!"549&15(-#5!,3U0Q("`@("`@3$1!
M($Q/0EE412Q8("`@.TQ/($)95$4-("`@("`@("`@("!35$$@)#`R("`@("`@
M("`[#2`@("`@("`@("`@3$1!($A)0EE413$L6"`@.TA)($)95$4-("`@("`@
M("`@("!35$$@)#`S("`@("`@("`[#2`@("`@("`@("`@3$1!("@D,#(I+%D@
M("`@.PT@("`@("`@("`@($]202!"251-05-++%@@(#M455).(%!)6$5,($].
M#2`@("`@("`@("`@4U1!("@D,#(I+%D@("`@.PT-("`@("`@("`@("!03$$@
M("`@("`@("`@("`[4D535$]212`N00T@("`@("`@("`@(%)44PT[+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM#5!,3U0R("`@("`@
M3$1!($Q/0EE412Q8("`@.TQ/($)95$4-("`@("`@("`@("!35$$@)#`R("`@
M("`@("`[#2`@("`@("`@("`@3$1!($A)0EE413`L6"`@.TA)($)95$4-("`@
M("`@("`@("!35$$@)#`S("`@("`@("`[#2`@("`@("`@("`@3$1!("@D,#(I
M+%D@("`@.PT@("`@("`@("`@($]202!"251-05-++%@@(#M455).(%!)6$5,
M($].#2`@("`@("`@("`@4U1!("@D,#(I+%D@("`@.PT-("`@("`@("`@("!0
M3$$@("`@("`@("`@("`[4D535$]212`N00T@("`@("`@("`@(%)44PT[+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM#41205=,24Y%
M("`@4T5#("`@("`@("`@("`@.T=%5"!$6`T@("`@("`@("`@($Q$02!8,2`@
M("`@("`@(#L-("`@("`@("`@("!30D,@6#(@("`@("`@("`[#2`@("`@("`@
M("`@4U1!($18("`@("`@("`@.PT-("`@("`@("`@("!314,@("`@("`@("`@
M("`[1T54($19#2`@("`@("`@("`@3$1!(%DQ("`@("`@("`@.PT@("`@("`@
M("`@(%-"0R!9,B`@("`@("`@(#L-("`@("`@("`@("!35$$@1%D@("`@("`@
M("`[#0T@("`@("`@("`@($-,0R`@("`@("`@("`@(#L-("`@("`@("`@("!,
M1$$@1%@@("`@("`@("`[#2`@("`@("`@("`@0E!,($1205=,24Y%,B`@.TA!
M3D1,12`M1%@-("`@("`@("`@("!%3U(@(R4Q,3$Q,3$Q,2`[34%+12!$6`T@
M("`@("`@("`@($%$0R`C)3`P,#`P,#`Q(#L@4$]3251)5D4-("`@("`@("`@
M("!35$$@1%@@("`@("`@("`[#0T@("`@("`@("`@($Q$02!$62`@("`@("`@
M(#L-("`@("`@("`@("!"4$P@1%)!5TQ)3D4Q("`[2$%.1$Q%("U$60T@("`@
M("`@("`@($5/4B`C)3$Q,3$Q,3$Q(#M-04M%($19#2`@("`@("`@("`@041#
M(",E,#`P,#`P,#$@.R!03U-)5$E610T@("`@("`@("`@(%-402!$62`@("`@
M("`@(#L-#2`@("`@("`@("`@3$1!($18#2`@("`@("`@("`@0TU0($19#2`@
M("`@("`@("`@0D-3($1,,#`-("`@("`@("`@("!*35`@1$PQ,`T-1%)!5TQ)
M3D4Q("!,1$$@1%@-("`@("`@("`@("!#35`@1%D-("`@("`@("`@("!"0U,@
M1$PR,`T@("`@("`@("`@($I-4"!$3#,P#0U$4D%73$E.13(@($Q$02!$60T@
M("`@("`@("`@($)03"!$4D%73$E.13,-("`@("`@("`@("!%3U(@(R4Q,3$Q
M,3$Q,0T@("`@("`@("`@($%$0R`C)3`P,#`P,#`Q#2`@("`@("`@("`@4U1!
M($19#0T@("`@("`@("`@($Q$02!$6`T@("`@("`@("`@($--4"!$60T@("`@
M("`@("`@($)#4R!$4D%73$E.130@("`[1$PT,`T@("`@("`@("`@($I-4"!$
M3#4P#0U$4D%73$E.13,@($Q$02!$6`T@("`@("`@("`@($--4"!$60T@("`@
M("`@("`@($)#4R!$4D%73$E.134@("`[1$PU,`T@("`@("`@("`@($I-4"!$
M3#".X]`TQ/",X]`TQ/
M"*``N5`#&&D!R3^0`JD_F5`#R,P\`]#K3$\(H`"Y4`,XZ0')`K`"J0*94`/(
MS#P#T.M,3PCN/@.M/@/)`]`%J0"-/@/)`M`&(*0,3$\(R0'0!B"$#$Q/""!D
M#$Q/"*D`C3T#3$\(J8&-#=Q,9OZ@`+E``QAM/0.90`/(S#P#T/!@H`"Y0`.-
MS@FY4`.-S0D@=`FMSPD8:4"-?@NMT`D8:4"-?PN@`;E``XW.";E0`XW-"2!T
M":W/"1AI0(V`"ZW0"1AI0(V!"XQS"2`""JQS":V`"XU^"ZV!"XU_"\C,/`/0
MQ*T8T$D"C1C0K1C0*0+P!B#T"TQR"2`L#&``C-$)C=()K,X)N0`Q..W-":BY
M`#*-TPFLS@FY`#$8;R`"]#M8*D`[8,+KGX+K'\+(-0)&(AM@@N0!.CM@PL@
MU`G,@0O0[6"I`.V""ZY^"ZQ_"R#4"1C*;8,+D`3([8(+(-0)[(`+T.U@J0#M
M@PNN?@NL?PL@U`D8R&V""Y`$RNV#"R#4"9@"?(
M$,U@H`"I`)D`*)F`*)D`*9F`*9D`*IF`*ID`*YF`*YD`+)F`+)D`+9F`+9D`
M+IF`+ID`+YF`+\@0S6"I!(T\`Z``N<0,F6`#F4`#N<@,F7`#F5`#R,P\`]#H
M8*D%C3P#H`"YS`R98`.90`.YT0R9<`.94`/(S#P#T.A@J0F-/`.@`+G6#)E@
M`YE``[G?#)EP`YE0`\C,/`/0Z&``5JP`*"@H*`!`P(``*"@H*"@`8,`@@.!`
MH``H*"@H*"@H*"@:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:
4&AH:&AH:&AH:&AH:&AH:&AH:&AH:
`
end
begin 644 m.sin_cos_sqr
M`0@H"`$`CR`M*RT@("!-04M%(%-)3B]#3U,O4U%2(#(N,"`@("TK+0!/"`(`
MCR`M*RT@("`@("`@("`@("`@("`@("`@("`@("`@("TK+0!V"`,`CR`M*RT@
M($)9.B!7059%1D]232`@("`@("`@("`@("TK+0"="`0`CR`M*RT@1D]2.B!$
M25-#/4]615)9($U!1T%:24Y%("TK+0#$"`4`CR`M*RT@($]..B`P.2TQ-"TY
M-B`@("`@("`@("`@("TK+0#J"`H`CR`Z.CH@34%+12!324X@04Y$($-/4R!4
M04),15,@.CHZ`!$)#`!"0;(Q,C(X.#I"2+*U*$)!K3(U-BDZ0DRR0D&K0DBL
M,C4V`!\)#@"!0EFR,*0R-34`/0D0`$1%LD)9K#$N-#`W.E)!LD1%K"C_K3$X
M,"D`5`D2`%.RM2@HORA202FL-C0IJBXU*0!J"10`BU.S,*=3LC(U-:NV*%,I
MJC$`@0D6`$.RM2@HOBA202FL-C0IJBXU*0"7"1@`BT.S,*=#LC(U-:NV*$,I
MJC$`N0D:`)<@0D$@JB!"62Q3.I<@0D$@JB`R-38@JB!"62Q#`+\)'`""`-P)
M,@"/(#HZ.B!-04M%(%-14B!404),12`Z.CH`Z@DT`(%"6;(PI#$R-P``"C8`
M4U&R*$)9K$)9*:TH-*PV-"D`&`HX`)<@0D$@JB`U,3(@JD)9+%-1.H(`)@H\
M`(%"6;(PI#$R-P`\"CX`4U&R*$)9K$)9*:TH-*PV-"D`6`I``)<@0D$@JB`U
M,3(@JC(U-:M"62Q343J"`',*9`"/(#HZ.B!3059%(%1/($1)4TL@.CHZ`),*
M9@"?,2PX+#$U+")3,#I324XO0T]3+U-14B(ZH#$`L`IH`)\R+#@L,BPB4TE.
M+T-/4R]345(L4"Q7(@##"FH`F#(LQRA"3"G'*$)(*3L`T`IL`(%4LC"D-S8W
M`.,*;@"8,BS'*,(H0D&J5"DI.P#I"G``@@#P"G(`H#(````:&AH:&AH:&AH:
#&AH:
`
end
begin 644 m.base_mask
M`0@H"`$`CR`M*RT@34%+12!"05-%+TU!4TL@5$%"3$4@,BXP("TK+0!/"`(`
MCR`M*RT@("`@("`@("`@("`@("`@("`@("`@("`@("TK+0!V"`,`CR`M*RT@
M($)9.B!7059%1D]232`@("`@("`@("`@("TK+0"="`0`CR`M*RT@1D]2.B!$
M25-#/4]615)9($U!1T%:24Y%("TK+0#$"`4`CR`M*RT@($]..B`P.2TQ-"TY
M-B`@("`@("`@("`@("TK+0#="`H`0D$@LB`Q,S`U-B`Z($-"LC@Q.3(`^P@,
M`$)(LK4H0D&M,C4V*3I"3+)"0:M"2*PR-38`#0ED`$.R,#J!5+(P(*0@,34`
M&`EF`(%1LC"D-P`J"6@`0C&R0T*J*%2L,3(X*0!("6H`2$*RM2A",:TR-38I
M.DQ"LD(QJTA"K#(U-@!J"6P`ER!"02"J($,L3$(ZER!"02"J(#$R.""J($,L
M2$(`>0EN`$.R0ZHQ.H)1+%0`BPEX`$.R,#J!5+(P(*0@,34`E@EZ`(%1LC"D
M-P"M"7P`0C&R0T*J,C`T.*HH5*PQ,C@I`,L)?@!(0K*U*$(QK3(U-BDZ3$*R
M0C&K2$*L,C4V`-\)@`"7($)!(*HR-38@JD,L2$(`[@F"`$.R0ZHQ.H)1+%0`
M_`F6`(%4LC`@I"`Q-0`'"I@`@5&R,*0W`"8*F@"7($)!(*H@,S@TJBA4K#@I
MJE$L,JXH-ZM1*0`O"IP`@E$L5`!-"L@`GS$L."PQ-2PB4S`Z0D%312]-05-+
M(CJ@,0!H"LH`GS(L."PR+")"05-%+TU!4TLL4"Q7(@!["LP`F#(LQRA"3"G'
M*$)(*3L`B`K.`(%4LC"D-3$Q`)L*T`"8,BS'*,(H0D&J5"DI.P"A"M(`@@"H
M"M0`H#(````:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:
M&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:
#&AH:
`
end
begin 644 sin_cos_sqr
M`#```@,%!@@)"PP.$!$3%!87&1H;'1X@(2(D)28G*2HK+"TN+S`Q,C,T-38W
M.#@Y.CL[/#P]/3X^/C\_/T!`0$!`0$!`0$!`/S\_/CX^/3T\/#LZ.CDX.#'1L:&!<5%!(1#PX,"PD(!@4#`0#^_?OZ
M^/;U\_+P[^WLZNGGYN7CXN#?WMS;VMG7UM74T]+0S\[-S+CY>;HZ>OL[N_Q\O3U]_CZ_/W_0$!`0$!`/S\_/CX^
M/3T\/#L[.CDX.#'1L:&!<6%!,1$`X,
M"PD(!@4#`@#^_?OZ^/?U\_+P[^WLZNGGYN7CXN#?WMS;VMG7UM74T]+1S\[-
MS+CY>;HZ>OL[N_Q\O3U]_CZ^_W_
M``(#!08("@L-#A`1$Q06%QD:'!T>("$B)"4F*"DJ*RPM+C`Q,C,T-#4V-S@Y
M.3H[.SP\/3T^/CX_/S]`0$!`0``````````````````````!`0$!`0$!`@("
M`@(#`P,#!`0$!`4%!04&!@8'!P<("`D)"0H*"@L+#`P-#0X.#P\0$!$1$A(3
M$Q04%146%Q<8&1D:&AL<'!T>'A\@(2$B(R0D)28G)R@I*BLK+"TN+S`Q,3(S
M-#4V-S@Y.CL\/3X_/SX]/#LZ.3@W-C4T,S(Q,3`O+BTL*RLJ*2@G)R8E)"0C
M(B$A(!\>'AT<'!L:&AD9&!<7%A45%!03$Q(2$1$0$`\/#@X-#0P,"PL*"@H)
M"0D("`<'!P8&!@4%!04$!`0$`P,#`P("`@("`0$!`0$!`0``````````````
M```````:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:
M&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:
I&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:&AH:
`
end
begin 644 base_mask
M`#,``````````("`@("`@("```````````"`@("`@("`@```````````@("`
M@("`@(```````````("`@("`@("```````````"`@("`@("`@```````````
M@("`@("`@(```````````("`@("`@("```````````"`@("`@("`@"`@("`@
M("`@("`@("`@("`A(2$A(2$A(2$A(2$A(2$A(B(B(B(B(B(B(B(B(B(B(B,C
M(R,C(R,C(R,C(R,C(R,D)"0D)"0D)"0D)"0D)"0D)24E)24E)24E)24E)24E
M)28F)B8F)B8F)B8F)B8F)B8G)R