Rendering logic
On the NeoGeo hardware, the term GPU (Graphics Processing Unit) may refer to a chip or a group of different chips used to generate the video signal.
- PRO-A0, PRO-B0 (early)
- LSPC2-A2, NEO-B1 (most common)
- NEO-GRC, NEO-OFC (CD systems)
- NEO-GRZ (CDZ, MV-1C ?)
See graphics pipeline for an overview of the interconnections between chips and cartridges.
Temporary notes
- PCK2 rises with BNKB and CHBL
- The first valid rendering cycle is 32mclk after CHBL low ?
- Fix and sprite pixels are rendered at the same speed because sprite pixels are written by pairs
- Tile pixel lines are rendered in halves:
- For the fix (32mclk = 8 pixels corresponds to 6MHz pixel clock):
- Full address is ...1**** (PCK2 pulse)
- 2H1 is 0 for 2 pixels (columns 0 & 1), then 1 for 2 pixels (columns 2 & 3)
- Full address is ...0**** (PCK2 pulse)
- 2H1 is 0 for 2 pixels (columns 4 & 5), then 1 for 2 pixels (columns 6 & 7)
- For sprites (32mclk = 16 pixels):
- Full address is ...1***** (PCK1 pulse)
- CA4 is 0 for 4 pixels (columns 0~3), then 1 for 4 pixels (columns 4~7)
- Full address is ...0***** (PCK1 pulse)
- CA4 is 0 for 4 pixels (columns 8~11), then 1 for 4 pixels (columns 12~15)
- As fix is rendered in realtime, the fix tile address is set before sprites (on a new line PCK2 pulses before PCK1)
- X position to B1, just before each PCK2 pulse (SP during 1mclk), for 20 sprites next to each other (X+16px each time):
- Start of line: 0000,0808,1010,1838,2000,2808,3010,3838,40C0,48E8,50F0,58F8,60C0,68E8,70F0,78F8,8000,8808,9010,9838,0,0,0...
Video generation
See Display timing for the sync signal timing.
NEO-B1 is used for double-buffering scanlines. While a buffer is output to the TV, the other one is filled up. They're swapped each new scaline. Each of the two line buffers are actually 2 buffers of even/odd pixels. They will be named (1 & 2), and (3 & 4).
- The TMS0 signal from LSPC tells B1 how the pair of buffers are used:
- 0: Buffers 1&2 are output to the TV. Buffers 3&4 are written to.
- 1: Buffers 1&2 are written to. Buffers 3&4 are output to the TV.
- CSK1~4 signals are used to step to the next pixel (rising edge ?), periodic for video output, VRAM-dependent when filling up. Inactive during H-blank.
- WSE1~4 signals are used to indicate if the pixel color from GAD/GBD needs to be written to the buffer (falling edge ?), matches CSK for video output (ignored ?), depends on DOTA/DOTB (opaque pixel signal) when filling up.
- SS1~2 signals are used to reset the pixel pointers on falling edge ? (probably wrong)
- The rising edge of PCK2 latches the X position of the sprite (and something else in a byte ?)
- 1H1 (6MHz / 2 pixels per byte = 3MHz) is used to clock in the pixels of FIXD into the pixel buffers (or directly to the output ?) if they're not 0000.
Sprite parsing
This is a draft. The following information shouldn't be considered as exact.
- LSPC runs at 24MHz
- Fast VRAM is 35ns
- The reads always occur 1mclk (41.6ns) after address is set
- FIXT: P23~16 are 0, P15~0 are S ROM address (+ external 2H1)
- SPRT: P23~0 are C ROM address (+ external CA4)
- LO: P23~16 are LO ROM data, P15~0 are LO address
- FP: P19~16 is the fix tile palette, rest is 0
- SP: P23~16 is the sprite tile palette, P15~8 is X position, P7~0 is ?
- LSPC always starts filling up active sprite list A ($8600) each new frame
- Read sequence (100p capacitor delay on AES too on PCKxB ?):
- Timing diagram when the sprite list for the actual line is already filled (no writes):
24M |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ Addr | 600 | 200 | 201 | 202 | 203 | 204 | 681 | 00E | 20E | 40E | 600 | 205 | 206 | 207 | 208 | 209 | 682 | 00F | 20F | 40F PCK1 ______|'''|___________________________________________________________|'''|_____________________________________________________ PCK1B '''''''|____|''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|'''''''''''''''''''''''''''''''''''''''''''''''''''' LOAD |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________ 12M __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_ 2Pixel | | | | | | | | | | | | | | | | Read ? ! ! ! ! ! ! ! ! ! ? ! ! ! ! ! ! ! ! ! What 1 2 2 2 2 2 3 4 5 6 1 2 2 2 2 2 3 4 5 6...
- 1: ?
- 2: Read SCB3 to see if sprite is in next scanline (just increments), starts frame at sprite 1 ?
- 3: Read sprite list to get sprite #
- 4: Read SCB2 zoom values
- 5: Read SCB3 Y/size/chain
- 6: Read SCB4 X
- Timing diagram when the sprite list is being filled:
24M |'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ Addr | 600 | 20F | 210 | 211 | 600 | 601 | 684 | 005 | 205 | 405 | 600 | 212 | 213 | 602 | 603 | 214 | 685 | 006 | 206 | 406 PCK1 ______|'''|___________________________________________________________|'''|_____________________________________________________ PCK1B '''''''|____|''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|___|'''''''''''''''''''''''''''''''''''''''''''''''''''' LOAD |'''''''|_______________________|'''''''|_______________________|'''''''|_______________________|'''''''|_______________________ 12M __|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|___|'''|_ 2Pixel | | | | | | | | | | | | | | | | /WE ''''''''''''''''''''''''''|___|'|___|'''''''''''''''''''''''''''''''''''''''''''''''|___|'|___|''''''''''''''''''''''''''''''''' Read ? ! ! ! ! ! ! ! ? ! ! ! ! ! ! !
- R/W sequences: (2 write buffers ?)
- 600 RRRWW... 600 RRWWR...
- 600 WWRRW... 600 WRRWW... 600 RRWWR ... 600 RWWRR
- Even lines: Write to list A, Read from list B (Start of display)
- Odd lines: Write to list B, Read from list A
- In 16clk, 2 sprites SCB3 max. are checked to fill up sprite list , and 1 sprite's attributes are read for output
- 384px * 4clk/px = 1536clk/line
- 1536clk / 16clk = 96 sprites max/line
- This means that there's at least 2 sprite SCB3 checked each 16clk, 4 writes to sprite list can be done max per 16clk ?
- Available CPU R/W slots depending on parsing progress, safest is ? cycles
Slow (lower) VRAM
- Slow VRAM is 100ns (10MHz) and is read at ?
- 4 slots per render cycle, 1 slot for CPU R/W (1 each 16 68k cycles)