Revision as of 04:02, 31 March 2016

On the NeoGeo hardware, the GPU (Graphics Processing Unit) a.k.a. VDP, may refer to a chip or a group of different chips used to generate the video signal.

LSPC-A0, PRO-B0 (early)
LSPC2-A2, NEO-B1 (most common)
NEO-GRC, NEO-OFC (CD systems)
NEO-GRZ (CDZ, MV-1C ?)

See graphics pipeline for an overview of the interconnections between chips and cartridges.

Temporary notes

Fix, then sprites (PCK1 then PCK2)
Fix and sprite pixels are rendered at the same speed because sprite pixels are also written by pairs (reason for the odd/even buffers)
Tile pixel lines are rendered in halves:

For the fix (32mclk = 8 pixels corresponds to 6MHz pixel clock):
- Full address is ...1**** (PCK2 pulse)
- 2H1 is 0 for 2 pixels (columns 0 & 1), then 1 for 2 pixels (columns 2 & 3)
- Full address is ...0**** (PCK2 pulse)
- 2H1 is 0 for 2 pixels (columns 4 & 5), then 1 for 2 pixels (columns 6 & 7)

For sprites (32mclk = 16 pixels):
- Full address is ...1***** (PCK1 pulse)
- CA4 is 0 for 4 pixels (columns 0~3), then 1 for 4 pixels (columns 4~7)
- Full address is ...0***** (PCK1 pulse)
- CA4 is 0 for 4 pixels (columns 8~11), then 1 for 4 pixels (columns 12~15)

As fix is rendered in realtime, the fix tile address is set before sprites (on a new line PCK1 pulses before PCK2)
X position to B1, just before each PCK2 pulse (SP during 1mclk), for 20 sprites next to each other (X+16px each time):
- Start of line: 0000,0808,1010,1838,2000,2808,3010,3838,40C0,48E8,50F0,58F8,60C0,68E8,70F0,78F8,8000,8808,9010,9838,0,0,0...

Video generation

See Display timing for the sync signal's timing.

NEO-B1 is used for double-buffering scanlines. While a buffer is output to the screen, the other one is filled up. They're swapped each new scanline. Each of the two line buffers are actually 2 buffers of even/odd pixels. They will be named (1 & 2), and (3 & 4).

The TMS0 signal from LSPC tells B1 how the pair of buffers are used:
- 0: Buffers 1&2 are output to the TV. Buffers 3&4 are written to.
- 1: Buffers 1&2 are written to. Buffers 3&4 are output to the TV.

CSK1~4 signals are used to step to the next pixel (falling edge ?), periodic for video output, VRAM-dependent when filling up. Inactive during H-blank.
WSE1~4 signals are used to indicate if the pixel color from GAD/GBD needs to be written to the buffer, matches CSK for video output (OE signal ?), depends on DOTA/DOTB (opaque pixel signal) when filling up.
SS1~2 signals ?
The rising edge of PCK1 and PCK2 stores fix or sprite pixels.
The X position of the sprite (and something else in a byte ?) is latched by CSK falling edges when LD* is low.
1H1 is probably used to switch pixels of FIXD between left and right.

It seems that fix data is read 8 pixels in advance (confirms what Charles wrote in mvstech.txt). How is this handled in B1 ?

Sprite parsing

This is a draft. The following information shouldn't be considered as exact.

To do: Edit waveforms, FP and SP windows of the P BUS start 0.5mclk earlier (1.5,5,1.5,1.5,5,1.5 = 16).

LSPC runs at 24MHz, but generates signals on rising and falling edges ("48MHz")
Fast VRAM is 35ns (<1mclk), slow VRAM is 100ns (<2.5mclk, 3 ?)
The fast VRAM reads always occur 1mclk (41.6ns) after address is set. Smallest access window is 1.5mclk.

FIXT: P23~16 is 0, P15~0 is S ROM address (+ external 2H1)
SPRT: P23~0 is C ROM address (+ external CA4)
LO: P23~16 is LO ROM data, P15~0 is LO address
FP: P19~16 is the fix tile palette, rest is 0
SP: P23~16 is the sprite tile palette, P15~8 is X position, P7~0 is ?

LSPC always starts filling up active sprite list A ($8600) each new frame

Read sequence:

Timing diagram when no sprites fall in the next scanline (no writes to sprite list):

Parse        ################################                                ################################
Render                                       ##########################                                      ##########################
24M    |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_
Addr   | 600 |  200  | 201 | 202 | 203 | 204 |  681  | 00E | 20E | 40E | 600 |  205  | 206 | 207 | 208 | 209 |  682  | 00F | 20F | 40F
PCK1   ______|'''|___________________________________________________________|'''|_____________________________________________________
PCK1B  '''''''|____|''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|''''''''''''''''''''''''''''''''''''''''''''''''''''
LOAD   |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________
12M    __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_
2Pixel       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |
Read       ?       !     !     !     !     !       !     !     !     !     ?       !     !     !     !     !       !     !     !     !
What      1      2      2     2     2     2      3      4     5     6     1      2      2     2     2     2      3      4     5     6...

1: Probably CPU acces slot with last address latched ($600)
2: Read sprite Y position from SCB3 ($200+) to see if it's in next scanline
3: Read sprite list ($600+) to get sprite #
4: Read SCB2 zoom values ($000+)
5: Read SCB3 Y/size/chain ($200+)
6: Read SCB4 X ($400+)

10 states in 16 cycles (or 5 in 8 cycles: 4-3-3-3-3).

Matched sprite = 6 or 7 (3 or 3.5mclk) = 2 slots Unmatched sprite = 3 or 4 (1.5 or 2mclk) = 1 slot 8mclk = 5 slots available for parsing out of 16mclk: 768mclk per scanline: 480 slots (unmatched). (240 matched)

SNK says max. 380 sprites can be parsed, and max 96 per line:

If we have 380-96 = 284 empty/unmatched sprites, then 96 matched sprites, we have 480-284 = 196 slots free, which allow 196/2 = 98 active sprites. Maybe 2 less because of scanline start delay or something ?

So the 380 sprites max. limit can be explained.

Can sprite parsing go over 380 ? For example:

Is an active sprite displayed after 400 unmatched sprites ?
Is an active sprite displayed after 474 unmatched sprites ? Does that hit the limit ? (((480-474)/2)-2) ?
Are 2 active sprites displayed after 474 unmatched sprites ? Only the first one ?
Is an active sprite displayed after 476 unmatched sprites ? Never?

SNK says min. 12 68kclk between writes (so 24mclk). 1 write every 24mclk = 64 per scanline.

Why 12 and not 8 ?

All parsing cycles are taken into consideration by SNK to give the "380 sprites max" limit, so CPU I/O must be done in the "1" states (address 600 by default when no I/O request ?).

Timing diagram when the sprite list is being filled:

+5/8:

0   5   2   7   4   1   6   3
|       |       |   |       |
  5   2   7   4   1   6   3  0
      |       |   |       |  |
0 1 2 3 4 5 6 7 8 9 A B C D E F
|       |     |     |     |    

LLLLLLLLHHHHHHHHLLLLHHHHHHHHLLLL
HHHHHHLLLLLLLLHHHHLLLLLLLLHHHLLL

             0 1 2 3 4 5 6 7 8 9 A B C D E F
             |       |     |     |     |

Parse        ################################                                ################################
Render                                       ##########################                                      ##########################
24M    |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_
Addr   | 600 |  20F  | 210 | 211 | 600 | 601 |  684  | 005 | 205 | 405 | 600 |  212  | 213 | 602 | 603 | 214 |  685  | 006 | 206 | 406
PCK1   ______|'''|___________________________________________________________|'''|_____________________________________________________
PCK1B  '''''''|___|'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|''''''''''''''''''''''''''''''''''''''''''''''''''''
LOAD   |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________
12M    __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_
2Pixel       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |
/WE    ''''''''''''''''''''''''''|___|'|___|'''''''''''''''''''''''''''''''''''''''''''''''|___|'|___|'''''''''''''''''''''''''''''''''
Read       ?       !     !     !                   !     !     !     !     ?       !     !                 !       !     !     !     !

R/W sequences: (2 write buffers ?)
600 RRRWW... 600 RRWWR...
600 WWRRW... 600 WRRWW... 600 RRWWR ... 600 RWWRR
Even lines: Write to list A, Read from list B (Start of display)
Odd lines: Write to list B, Read from list A
In 16clk, 2 sprites SCB3 max. are checked to fill up sprite list , and 1 sprite's attributes are read for output
384px * 4clk/px = 1536clk/line
1536clk / 16clk = 96 sprites max/line

Available CPU R/W slots depending on parsing progress, safest is ? cycles

Slow (lower) VRAM

Slow VRAM is 100ns (10MHz) and is read at ?
4 slots per render cycle, 1 slot for CPU R/W (1 each 16 68k cycles)

@@ Line 10: / Line 10: @@
 ==Temporary notes==
-*PCK2 rises with BNKB and CHBL
+*Fix, then sprites (PCK1 then PCK2)
-*The first valid rendering cycle is 32mclk after CHBL low ?
+*Fix and sprite pixels are rendered at the same speed because sprite pixels are also written by pairs (reason for the odd/even buffers)
-*Fix and sprite pixels are rendered at the same speed because sprite pixels are written by pairs
 *Tile pixel lines are rendered in halves:
@@ Line 27: / Line 26: @@
 **CA4 is 0 for 4 pixels (columns 8~11), then 1 for 4 pixels (columns 12~15)
-*As fix is rendered in realtime, the fix tile address is set before sprites (on a new line PCK2 pulses before PCK1)
+*As fix is rendered in realtime, the fix tile address is set before sprites (on a new line PCK1 pulses before PCK2)
 *X position to B1, just before each PCK2 pulse (SP during 1mclk), for 20 sprites next to each other (X+16px each time):
 ** Start of line: 0000,0808,1010,1838,2000,2808,3010,3838,40C0,48E8,50F0,58F8,60C0,68E8,70F0,78F8,8000,8808,9010,9838,0,0,0...

Rendering logic: Difference between revisions

Revision as of 04:02, 31 March 2016

Contents

Temporary notes

Video generation

Sprite parsing

Slow (lower) VRAM

Navigation menu