https://wiki.neogeodev.org/api.php?action=feedcontributions&user=Frenchshark&feedformat=atomNeoGeo Development Wiki - User contributions [en]2024-03-29T07:47:16ZUser contributionsMediaWiki 1.40.0https://wiki.neogeodev.org/index.php?title=Rendering_logic&diff=4947Rendering logic2016-08-09T09:45:28Z<p>Frenchshark: /* Sprite parsing */</p>
<hr />
<div>On the NeoGeo hardware, the GPU (Graphics Processing Unit) a.k.a. VDP, may refer to a chip or a group of different chips used to generate the video signal.<br />
<br />
* [[LSPC-A0]], [[PRO-B0]] (early)<br />
* [[LSPC2-A2]], [[NEO-B1]] (most common)<br />
* [[NEO-GRC]], [[NEO-OFC]] (CD systems)<br />
* [[NEO-GRZ]] (CDZ, MV-1C ?)<br />
<br />
See [[graphics pipeline]] for an overview of the interconnections between chips and cartridges.<br />
<br />
==Temporary notes==<br />
<br />
*Fix, then sprites (PCK1 then PCK2)<br />
*Fix and sprite pixels are rendered at the same speed because sprite pixels are also written by pairs (reason for the odd/even buffers)<br />
*Tile pixel lines are rendered in halves:<br />
<br />
*For the fix (32mclk = 8 pixels corresponds to 6MHz pixel clock):<br />
**Full address is ...1**** (PCK2 pulse)<br />
**2H1 is 0 for 2 pixels (columns 0 & 1), then 1 for 2 pixels (columns 2 & 3)<br />
**Full address is ...0**** (PCK2 pulse)<br />
**2H1 is 0 for 2 pixels (columns 4 & 5), then 1 for 2 pixels (columns 6 & 7)<br />
<br />
*For sprites (32mclk = 16 pixels):<br />
**Full address is ...1***** (PCK1 pulse)<br />
**CA4 is 0 for 4 pixels (columns 0~3), then 1 for 4 pixels (columns 4~7)<br />
**Full address is ...0***** (PCK1 pulse)<br />
**CA4 is 0 for 4 pixels (columns 8~11), then 1 for 4 pixels (columns 12~15)<br />
<br />
*As fix is rendered in realtime, the fix tile address is set before sprites (on a new line PCK1 pulses before PCK2)<br />
*X position to B1, just before each PCK2 pulse (SP during 1mclk), for 20 sprites next to each other (X+16px each time):<br />
** Start of line: 0000,0808,1010,1838,2000,2808,3010,3838,40C0,48E8,50F0,58F8,60C0,68E8,70F0,78F8,8000,8808,9010,9838,0,0,0...<br />
<br />
=Video generation=<br />
<br />
See [[Display timing]] for the sync signal's timing.<br />
<br />
[[NEO-B1]] is used for double-buffering scanlines. While a buffer is output to the screen, the other one is filled up. They're swapped each new scanline. Each of the two line buffers are actually 2 buffers of even/odd pixels. They will be named (1 & 2), and (3 & 4).<br />
<br />
*The TMS0 signal from LSPC tells B1 how the pair of buffers are used:<br />
**0: Buffers 1&2 are output to the TV. Buffers 3&4 are written to.<br />
**1: Buffers 1&2 are written to. Buffers 3&4 are output to the TV.<br />
<br />
*CSK1~4 signals are used to step to the next pixel (falling edge ?), periodic for video output, VRAM-dependent when filling up. Inactive during H-blank.<br />
*WSE1~4 signals are used to indicate if the pixel color from GAD/GBD needs to be written to the buffer, matches CSK for video output (OE signal ?), depends on DOTA/DOTB (opaque pixel signal) when filling up.<br />
*SS1~2 signals ?<br />
*The rising edge of PCK1 and PCK2 stores fix or sprite pixels.<br />
*The X position of the sprite (and something else in a byte ?) is latched by CSK falling edges when LD* is low.<br />
*1H1 is probably used to switch pixels of FIXD between left and right.<br />
<br />
It seems that fix data is read 8 pixels in advance (confirms what Charles wrote in mvstech.txt). How is this handled in B1 ?<br />
<br />
=Sprite parsing=<br />
<br />
<span style="color:#FF0000">This is a draft. The following information shouldn't be considered as exact.</span><br />
<br />
To do: Edit waveforms, FP and SP windows of the P BUS start 0.5mclk earlier (1.5,5,1.5,1.5,5,1.5 = 16).<br />
<br />
*LSPC runs at 24MHz, but generates signals on rising and falling edges ("48MHz")<br />
*Fast VRAM is 35ns (<1mclk), slow VRAM is 100ns (<2.5mclk, 3 ?)<br />
*The fast VRAM reads always occur 1mclk (41.6ns) after address is set. Smallest access window is 1.5mclk.<br />
[[file:timing_gpu1.png]]<br />
<br />
*FIXT: P23~16 is 0, P15~0 is S ROM address (+ external 2H1)<br />
*SPRT: P23~0 is C ROM address (+ external CA4)<br />
*LO: P23~16 is [[LO]] ROM data, P15~0 is LO address<br />
*FP: P19~16 is the fix tile palette, rest is 0<br />
*SP: P23~16 is the sprite tile palette, P15~8 is X position, P7~0 is ?<br />
<br />
*LSPC always starts filling up active sprite list A ($8600) each new frame <br />
<br />
Read sequence:<br />
<br />
Timing diagram when no sprites fall in the next scanline (no writes to sprite list):<br />
<pre><br />
Parse ################################ ################################<br />
Render ########################## ##########################<br />
24M |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_<br />
Addr | 600 | 200 | 201 | 202 | 203 | 204 | 681 | 00E | 20E | 40E | 600 | 205 | 206 | 207 | 208 | 209 | 682 | 00F | 20F | 40F<br />
PCK1 ______|'''|___________________________________________________________|'''|_____________________________________________________<br />
PCK1B '''''''|____|''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|''''''''''''''''''''''''''''''''''''''''''''''''''''<br />
LOAD |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________<br />
12M __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_<br />
2Pixel | | | | | | | | | | | | | | | |<br />
Read ? ! ! ! ! ! ! ! ! ! ? ! ! ! ! ! ! ! ! !<br />
What 1 2 2 2 2 2 3 4 5 6 1 2 2 2 2 2 3 4 5 6...<br />
</pre><br />
<br />
*1: Probably CPU acces slot with last address latched ($600)<br />
*2: Read sprite Y position from SCB3 ($200+) to see if it's in next scanline<br />
*3: Read sprite list ($600+) to get sprite #<br />
*4: Read SCB2 zoom values ($000+)<br />
*5: Read SCB3 Y/size/chain ($200+)<br />
*6: Read SCB4 X ($400+)<br />
<br />
10 states in 16 cycles (or 5 in 8 cycles: 4-3-3-3-3).<br />
<br />
One scanline contains 1536mclk cycles or 96 sequences of 16mclk cycles.<br />
<br />
Half of the mclk cycles are reserved for Sprite parsing, the other half is for sprite rendering and CPU access.<br />
<br />
Each half has 96 x 5 = 480 states.<br />
<br />
For the parsing :<br />
-----------------<br />
SCB3 is read (from $200 to $380), each time there is a sprite match, a write state to the sprite list is inserted (apparently 2 states after the corresponding SCB3 read).<br />
<br />
Once SCB3 address $380 is reached, only $0000 write states to the sprite list are possible (in order to fill the rest of the sprite list with zeros).<br />
<br />
No matter how many sprites are matched in the scanline, we will always have 384 SCB3 read states and 96 sprite list write states.<br />
<br />
That explains why sprite #0 cannot be used : this is the value used to terminate the sprite list. It would have been smarter for SNK to use the value 511 instead...<br />
<br />
According to Charles MacDonald's document, the GPU always renders 96 sprites : the "filler" sprite #0 can be rendered many times per scanline.<br />
<br />
For the rendering :<br />
-------------------<br />
The state order is :<br />
*1: Read sprite list ($600+) to get sprite #<br />
*2: Read SCB2 zoom values ($000+)<br />
*3: Read SCB3 Y pos/size/sticky ($200+)<br />
*4: Read SCB4 X pos ($400+)<br />
*5: Read/write from CPU<br />
One remark : it is more logical to have SCB3 read before SCB2 because we need the sticky bit to make the decision of keeping the previous vertical shrink value or not.<br />
<br />
Is the sticky bit written to the sprite list along with the sprite number or did SNK waste an additionnal 8-bit temporary register in their design ?<br />
<br />
CPU access to High VRAM :<br />
-------------------------<br />
SNK says min. 12 68kclk between writes (so 24mclk). 1 write every 24mclk = 64 per scanline.<br />
<br />
Why 12 and not 8 ?<br />
<br />
68000 DTACK# logic is apparently only tied to GPU registers access. So CPU access during state #5 occurs asynchronously with the 68000 bus.<br />
<br />
My guess on the HW implementation is that during state #5, the GPU always reads the memory content pointed by REG_VRAMADDR and updates the read latch of REG_VRAMRW.<br />
<br />
If write latch REG_VRAMRW is written by the 68000, the next state #5 becomes a write access and REG_VRAMADDR is incremented by REG_VRAMMOD value.<br />
<br />
Even if a theoretical limit of 16mclk or 8 68kclk is possible, some additionnal cycles are needed for the address and data to propagate through the chip.<br />
<br />
Or maybe SNK has given the worst case scenario between slow Low VRAM and fast High VRAM ?<br />
<br />
Timing diagram when the sprite list is being filled:<br />
<pre><br />
+5/8:<br />
<br />
0 5 2 7 4 1 6 3<br />
| | | | |<br />
5 2 7 4 1 6 3 0<br />
| | | | |<br />
0 1 2 3 4 5 6 7 8 9 A B C D E F<br />
| | | | | <br />
<br />
LLLLLLLLHHHHHHHHLLLLHHHHHHHHLLLL<br />
HHHHHHLLLLLLLLHHHHLLLLLLLLHHHLLL<br />
<br />
0 1 2 3 4 5 6 7 8 9 A B C D E F<br />
| | | | |<br />
<br />
Parse ################################ ################################<br />
Render ########################## ##########################<br />
24M |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_<br />
Addr | 600 | 20F | 210 | 211 | 600 | 601 | 684 | 005 | 205 | 405 | 600 | 212 | 213 | 602 | 603 | 214 | 685 | 006 | 206 | 406<br />
PCK1 ______|'''|___________________________________________________________|'''|_____________________________________________________<br />
PCK1B '''''''|___|'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|''''''''''''''''''''''''''''''''''''''''''''''''''''<br />
LOAD |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________<br />
12M __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_<br />
2Pixel | | | | | | | | | | | | | | | |<br />
/WE ''''''''''''''''''''''''''|___|'|___|'''''''''''''''''''''''''''''''''''''''''''''''|___|'|___|'''''''''''''''''''''''''''''''''<br />
Read ? ! ! ! ! ! ! ! ? ! ! ! ! ! ! !<br />
</pre><br />
<br />
*R/W sequences: (2 write buffers ?)<br />
*600 RRRWW... 600 RRWWR...<br />
*600 WWRRW... 600 WRRWW... 600 RRWWR ... 600 RWWRR<br />
*Even lines: Write to list A, Read from list B (Start of display)<br />
*Odd lines: Write to list B, Read from list A<br />
*In 16clk, 2 sprites SCB3 max. are checked to fill up sprite list , and 1 sprite's attributes are read for output<br />
*384px * 4clk/px = 1536clk/line<br />
*1536clk / 16clk = 96 sprites max/line<br />
<br />
*Available CPU R/W slots depending on parsing progress, safest is ? cycles<br />
<br />
==Slow (lower) VRAM==<br />
<br />
*Slow VRAM is 100ns (10MHz) and is read at ?<br />
*4 slots per render cycle, 1 slot for CPU R/W (1 each 16 68k cycles)<br />
<br />
[[Category:Video system]]</div>Frenchshark