Rendering logic

From NeoGeo Development Wiki
Jump to navigation Jump to search

On the NeoGeo hardware, the GPU (Graphics Processing Unit) a.k.a. VDP, may refer to a chip or a group of different chips used to generate the video signal.

See graphics pipeline for an overview of the interconnections between chips and cartridges.

Temporary notes

  • Fix, then sprites (PCK1 then PCK2)
  • Fix and sprite pixels are rendered at the same speed because sprite pixels are also written by pairs (reason for the odd/even buffers)
  • Tile pixel lines are rendered in halves:
  • For the fix (32mclk = 8 pixels corresponds to 6MHz pixel clock):
    • Full address is ...1**** (PCK2 pulse)
    • 2H1 is 0 for 2 pixels (columns 0 & 1), then 1 for 2 pixels (columns 2 & 3)
    • Full address is ...0**** (PCK2 pulse)
    • 2H1 is 0 for 2 pixels (columns 4 & 5), then 1 for 2 pixels (columns 6 & 7)
  • For sprites (32mclk = 16 pixels):
    • Full address is ...1***** (PCK1 pulse)
    • CA4 is 0 for 4 pixels (columns 0~3), then 1 for 4 pixels (columns 4~7)
    • Full address is ...0***** (PCK1 pulse)
    • CA4 is 0 for 4 pixels (columns 8~11), then 1 for 4 pixels (columns 12~15)
  • As fix is rendered in realtime, the fix tile address is set before sprites (on a new line PCK1 pulses before PCK2)
  • X position to B1, just before each PCK2 pulse (SP during 1mclk), for 20 sprites next to each other (X+16px each time):
    • Start of line: 0000,0808,1010,1838,2000,2808,3010,3838,40C0,48E8,50F0,58F8,60C0,68E8,70F0,78F8,8000,8808,9010,9838,0,0,0...

Video generation

See Display timing for the sync signal's timing.

NEO-B1 is used for double-buffering scanlines. While a buffer is output to the screen, the other one is filled up. They're swapped each new scanline. Each of the two line buffers are actually 2 buffers of even/odd pixels. They will be named (1 & 2), and (3 & 4).

  • The TMS0 signal from LSPC tells B1 how the pair of buffers are used:
    • 0: Buffers 1&2 are output to the TV. Buffers 3&4 are written to.
    • 1: Buffers 1&2 are written to. Buffers 3&4 are output to the TV.
  • CSK1~4 signals are used to step to the next pixel (falling edge ?), periodic for video output, VRAM-dependent when filling up. Inactive during H-blank.
  • WSE1~4 signals are used to indicate if the pixel color from GAD/GBD needs to be written to the buffer, matches CSK for video output (OE signal ?), depends on DOTA/DOTB (opaque pixel signal) when filling up.
  • SS1~2 signals ?
  • The rising edge of PCK1 and PCK2 stores fix or sprite pixels.
  • The X position of the sprite (and something else in a byte ?) is latched by CSK falling edges when LD* is low.
  • 1H1 is probably used to switch pixels of FIXD between left and right.

It seems that fix data is read 8 pixels in advance (confirms what Charles wrote in mvstech.txt). How is this handled in B1 ?

Sprite parsing

This is a draft. The following information shouldn't be considered as exact.

To do: Edit waveforms, FP and SP windows of the P BUS start 0.5mclk earlier (1.5,5,1.5,1.5,5,1.5 = 16).

  • LSPC runs at 24MHz, but generates signals on rising and falling edges ("48MHz")
  • Fast VRAM is 35ns (<1mclk), slow VRAM is 100ns (<2.5mclk, 3 ?)
  • The fast VRAM reads always occur 1mclk (41.6ns) after address is set. Smallest access window is 1.5mclk.

  • FIXT: P23~16 is 0, P15~0 is S ROM address (+ external 2H1)
  • SPRT: P23~0 is C ROM address (+ external CA4)
  • LO: P23~16 is LO ROM data, P15~0 is LO address
  • FP: P19~16 is the fix tile palette, rest is 0
  • SP: P23~16 is the sprite tile palette, P15~8 is X position, P7~0 is ?
  • LSPC always starts filling up active sprite list A ($8600) each new frame

Read sequence:

Timing diagram when no sprites fall in the next scanline (no writes to sprite list):

Parse        ################################                                ################################
Render                                       ##########################                                      ##########################
24M    |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_
Addr   | 600 |  200  | 201 | 202 | 203 | 204 |  681  | 00E | 20E | 40E | 600 |  205  | 206 | 207 | 208 | 209 |  682  | 00F | 20F | 40F
PCK1   ______|'''|___________________________________________________________|'''|_____________________________________________________
PCK1B  '''''''|____|''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|''''''''''''''''''''''''''''''''''''''''''''''''''''
LOAD   |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________
12M    __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_
2Pixel       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |
Read       ?       !     !     !     !     !       !     !     !     !     ?       !     !     !     !     !       !     !     !     !
What      1      2      2     2     2     2      3      4     5     6     1      2      2     2     2     2      3      4     5     6...
  • 1: Probably CPU acces slot with last address latched ($600)
  • 2: Read sprite Y position from SCB3 ($200+) to see if it's in next scanline
  • 3: Read sprite list ($600+) to get sprite #
  • 4: Read SCB2 zoom values ($000+)
  • 5: Read SCB3 Y/size/chain ($200+)
  • 6: Read SCB4 X ($400+)

10 states in 16 cycles (or 5 in 8 cycles: 4-3-3-3-3).

Matched sprite = 6 or 7 (3 or 3.5mclk) = 2 slots Unmatched sprite = 3 or 4 (1.5 or 2mclk) = 1 slot 8mclk = 5 slots available for parsing out of 16mclk: 768mclk per scanline: 480 slots (unmatched). (240 matched)

SNK says max. 380 sprites can be parsed, and max 96 per line:

If we have 380-96 = 284 empty/unmatched sprites, then 96 matched sprites, we have 480-284 = 196 slots free, which allow 196/2 = 98 active sprites. Maybe 2 less because of scanline start delay or something ?

So the 380 sprites max. limit can be explained.

Can sprite parsing go over 380 ? For example:

  • Is an active sprite displayed after 400 unmatched sprites ?
  • Is an active sprite displayed after 474 unmatched sprites ? Does that hit the limit ? (((480-474)/2)-2) ?
  • Are 2 active sprites displayed after 474 unmatched sprites ? Only the first one ?
  • Is an active sprite displayed after 476 unmatched sprites ? Never?

SNK says min. 12 68kclk between writes (so 24mclk). 1 write every 24mclk = 64 per scanline.

Why 12 and not 8 ?

All parsing cycles are taken into consideration by SNK to give the "380 sprites max" limit, so CPU I/O must be done in the "1" states (address 600 by default when no I/O request ?).

Timing diagram when the sprite list is being filled:

+5/8:

0   5   2   7   4   1   6   3
|       |       |   |       |
  5   2   7   4   1   6   3  0
      |       |   |       |  |
0 1 2 3 4 5 6 7 8 9 A B C D E F
|       |     |     |     |    

LLLLLLLLHHHHHHHHLLLLHHHHHHHHLLLL
HHHHHHLLLLLLLLHHHHLLLLLLLLHHHLLL

             0 1 2 3 4 5 6 7 8 9 A B C D E F
             |       |     |     |     |

Parse        ################################                                ################################
Render                                       ##########################                                      ##########################
24M    |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_
Addr   | 600 |  20F  | 210 | 211 | 600 | 601 |  684  | 005 | 205 | 405 | 600 |  212  | 213 | 602 | 603 | 214 |  685  | 006 | 206 | 406
PCK1   ______|'''|___________________________________________________________|'''|_____________________________________________________
PCK1B  '''''''|___|'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|''''''''''''''''''''''''''''''''''''''''''''''''''''
LOAD   |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________
12M    __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_
2Pixel       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |
/WE    ''''''''''''''''''''''''''|___|'|___|'''''''''''''''''''''''''''''''''''''''''''''''|___|'|___|'''''''''''''''''''''''''''''''''
Read       ?       !     !     !                   !     !     !     !     ?       !     !                 !       !     !     !     !
  • R/W sequences: (2 write buffers ?)
  • 600 RRRWW... 600 RRWWR...
  • 600 WWRRW... 600 WRRWW... 600 RRWWR ... 600 RWWRR
  • Even lines: Write to list A, Read from list B (Start of display)
  • Odd lines: Write to list B, Read from list A
  • In 16clk, 2 sprites SCB3 max. are checked to fill up sprite list , and 1 sprite's attributes are read for output
  • 384px * 4clk/px = 1536clk/line
  • 1536clk / 16clk = 96 sprites max/line
  • Available CPU R/W slots depending on parsing progress, safest is ? cycles

Slow (lower) VRAM

  • Slow VRAM is 100ns (10MHz) and is read at ?
  • 4 slots per render cycle, 1 slot for CPU R/W (1 each 16 68k cycles)