Rendering logic: Difference between revisions

From NeoGeo Development Wiki
Jump to navigation Jump to search
m (Details about buffer pointers)
 
(One intermediate revision by one other user not shown)
Line 46: Line 46:


* Sprite pixels are rendered two-by-two at 12MHz (2mclk).
* Sprite pixels are rendered two-by-two at 12MHz (2mclk).
* One sprite tile line is therefore output in: 16 pixels / 2 * 2mclk = 16mclk. No variation, even if shrinking is used.
* One sprite tile line is therefore rendered in: 16 pixels / 2 * 2mclk = 16mclk. No variation, even if shrinking is used.
* The C ROM outputs 2 * 16 = 32 bits at a time so: 32 bits / 4bpp = 8 pixels at a time.
* The C ROM outputs 2 * 16 = 32 bits at a time so: 32 bits / 4bpp = 8 pixels at a time.
* C ROM reads needed for one sprite tile line: 32 pixels / 8 pixels per read = 4 reads.
* C ROM reads needed for one sprite tile line: 16 pixels / 8 pixels per read = 2 reads.


Address sequence for one tile line:
Address sequence for one tile line:

Latest revision as of 21:44, 5 December 2018

Depending on the chipset, video is generated by 3, 2 or one unique chip:

See graphics pipeline for an overview of the interconnections between chips and cartridges. See Display timing for the sync signal's timing.

There are two main parts in generating video:

  • An address generator (LSPC), which queries the graphics ROMs in the cartridges according to the data set in VRAM.
  • Line buffers, to which pixels can be written in any order from the graphics ROMs data.

Line buffers

To render sprites, the NeoGeo uses a pair of line buffers which are each 320 pixels long (a whole scanline). When one is used for rendering, the other one is shifted out for video output. Each new scanline, the buffers are flipped. This can be seen as a kind of double-buffering, allowing pixels to be rendered in any order.

To increase bandwidth, pixels are rendered two by two in sub-pairs: there are actually 4, 160-pixels-long buffers interleaved in an odd/even fashion. This scheme was inherited from the Alpha68k.

The fix layer pixels are rendered in real time over the buffers output.

Fix tiles

  • Fix pixels are output in time with the pixel clock (6MHz, 4mclk).
  • One fix tile line is therefore output in: 8 pixels * 4mclk = 32mclk. No variation.
  • The S ROM outputs 8 bits at a time so: 8 bits / 4bpp = 2 pixels at a time.
  • S ROM reads needed for one fix tile line: 8 pixels / 2 pixels per read = 4 reads.

Address sequence for one tile line:

A4 2H1 pixel pair
1 0 A
1 1 B
0 0 C
0 1 D

2H1 bypasses the PCK* latchs.

Sprite tiles

16mclk = 16 pixels, 8 pixels per read.

  • Sprite pixels are rendered two-by-two at 12MHz (2mclk).
  • One sprite tile line is therefore rendered in: 16 pixels / 2 * 2mclk = 16mclk. No variation, even if shrinking is used.
  • The C ROM outputs 2 * 16 = 32 bits at a time so: 32 bits / 4bpp = 8 pixels at a time.
  • C ROM reads needed for one sprite tile line: 16 pixels / 8 pixels per read = 2 reads.

Address sequence for one tile line:

CA4 8-pixel line
1 A
0 B

CA4 bypasses the PCK* latchs.

Active lists

The NeoGeo uses a pair of active lists, where the sprites numbers which need to be rendered on the next scanline are written to. As with the line buffers, the active lists are swapped every new scanline so that one is being filled by parsing, the other one is used for rendering.

They are located in the fast VRAM at addresses $8600 and $8680. Each list is 96-entries long.

Slow VRAM access slots

Slow VRAM has four 4mclk-long access slots running in sequence with no variations:

  1. Read sprite map even word
  2. Read sprite map odd word
  3. Read fix map
  4. Read/Write for CPU

Fast VRAM access slots

Fast VRAM is more complex and faster. It has 10 access slots with varying widths running in sequence with no variations, which can be seen as 5 parsing slots and 5 rendering slots:

Slot # Duration Description
1 2mclk Parsing
2 1.5mclk
3 1.5mclk
4 1.5mclk
5 1.5mclk
6 2mclk Read active list
7 1.5mclk Read SCB2
8 1.5mclk Read SCB3
9 1.5mclk Read SCB4
10 1.5mclk Read/write for CPU

Yellow are parsing cycles, purple is the active list read, green is SCB* reads for rendering, red is for CPU access:

The parsing cycles aren't consistent, they depend on the matching of sprites. One cycle will read from SCB3 to test if its Y position matches with the current raster line. If there's a match, the next cycle will be a write to the active list. Otherwise it's another read cycle.

Fast VRAM must be fast enough (45ns) as the shortest slots are 1.5mclk (62.5ns). 1mclk (41.6ns) would be too fast and SRAM was already expensive.

Sprite parsing

This is still a draft. The following information shouldn't be considered as correct.

LSPC splits the workload needed to render sprites in two passes: parsing and rendering.

  • Parsing for a raster line N is done during line N-2
  • Rendering is done during line N-1
  • Finally the line is ready for output just at the right time.

During parsing, the Y positions of 381 sprites are read to see if they will be visible on line N. If that's the case, the sprite number is written to the active list currently being filled. This goes on until 381 sprites were parsed, OR the active list is full (96 sprite numbers were written), whichever comes first.

  • If sprite #382 is reached, the remaining time is used to fill the active list up to 96 entries with zeros.
  • If the active list is full, sprites are still parsed up to #382 but no writes are done to the active list, whatever the matching result.

No matter how many sprites are matched in the scanline, there will always be 381 SCB3 reads and 96 active list writes.

This explains why sprite #0 cannot be used: this is the value used to top-up the active list. If there are less than 96 sprite matches (like most of the time), the sprite #0 will be rendered over and over again until the end of the list is reached.

In the next paragraphs, each character represents a parsing slot: R is an SCB3 read, W is a sprite number write to the active list, F is a filling write to the active list, - is just idle waiting. There are always 1536mclk per line / 16mclk per cycle * 5 slots per cycle = 480 slots.

Case 1: Not a single sprite match

Fast VRAM cycles:

24M    _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_
Addr     200  | 201 | 202 | 203 | 204 |  681  | 00E | 20E | 40E | 600 |  205  | 206 | 207 | 208 | 209 |  682  | 00F | 20F | 40F
R/W     Read   Read  Read  Read  Read                                   Read   Read  Read  Read  Read

381 read slots, 96 fill slots, 3 waiting slots:

RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF---

Case 2: Some sprites match

Fast VRAM cycles:

24M    _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_
Addr     200  | 201 | 600 | 202 | 203 |  681  | 00E | 20E | 40E | 600 |  601  | 204 | 205 | 602 | 206 |  682  | 00F | 20F | 40F
R/W     Read   Read  Write Read  Read                                   Write  Read  Read  Write Read

If 17 sprites match: 381 read slots, 17 write slots, 96-17=79 fill slots, 3 waiting slots:

RRRRRRWRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRWRRRRRRWRRRRRRWRRRRRWRWRWRRRRRRRRRWRRRRRRRRRRRRRRWRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRWRWRRRRWRWRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF---

Case 3: Exactly 96 sprites match

Fast VRAM cycles:

24M    _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_
Addr     200  | 600 | 201 | 601 | 202 |  681  | 00E | 20E | 40E | 600 |  602  | 203 | 204 | 603 | 604 |  682  | 00F | 20F | 40F
R/W     Read   Write Read  Write Read                                   Write  Read  Read  Write Read

381 read slots, 96 write slots, 0 fill slots, 3 waiting slots:

RRRRRRWRRRRWRRRRRRRWRRRRRWRWRWRRWRRRRRRRRRWRRRRRRWRRWRRRRWRRRRRWRWRWRRRRRWRRRRWRRRRRRRRWRRRRRRWRRRWR
WRWRRWRRRRRRRWRRRWRRRWRRRRWRRRRRRRRWRWRRRRWRWRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRRRRRRWRRRWRWRWRWRRWRW
RRRRWRRWRRWRRRWRRRWRRWRRRRWRWRWRRWRRRWRWRWRRWRRRRWRRRRRWRRWRRWRRRRRRWRRRWRRRRWRRRRRWRRWRRRWRRRWRRRRR
WRWRWRWRWRWRRRWRRRWRRRRRRRWRRRWRWRRRWRRRRRRRRRWRRRRRRRWRWRWRWRRRRRRRRRRWRRWRRRRRRWRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRWRRRWRRRRWRRRWRRRRRRRWRRRRRRRWRRRRWRWRRRWRRWRRRRWRRRRRWRRRRRWR---

Case 4: More than 96 sprites match

Same as case 3, except after 96 "W"s, there are only useless "R"s.

Rendering

  1. Read active list ($8600+ or $8680+) to get sprite #
  2. Read SCB2 zoom values ($8000+)
  3. Read SCB3 Y position, height, and chain bit ($8200+)
  4. Read SCB4 X position ($8400+)

The tile # and its attributes are also read from slow VRAM.

CPU access to VRAM

SNK says min. 12 68kclk between writes (so 24mclk). 1 write every 24mclk = 64 per scanline.

CPU access occurs asynchronously with the 68000 bus -> storage in LSPC. If no write is requested, then the slots are occupied by reads, effectively updating one of the two read buffers continuously with the value pointed by the last used VRAM address.

Buffers control

CK signals

CK1~4 signals are used to clock each of the 4 buffers.

  • During rendering, the pulses often go by pair (1+2 or 3+4) to render pixels 2 by 2 if the corresponding WE signal is asserted (opaque pixel). Horizontal shrinking causes pulses to be skipped, so that the buffer's address isn't incremented.
  • During output, the pulses are slower and always alternate (1/2/1/2... or 3/4/3/4...) to output even/odd pixels in sequence.
  • If the corresponding LD* signal is high, the buffer pointer is incremented (rendering left to right).
  • If the corresponding LD* signal is low, the buffer pointer is loaded from the P bus (X position of sprite, or 0 to start line output).

Inactive during H-blank.

LD signals

The LD1~2 signals are synchronous signals used to load the pointers for a buffer pair as two bytes.

Example P bus values for 5 full-width sprites right next to each other, starting at X=0:

0000,0808,1010,1818,2020

Example P bus values for 5 full-width sprites right next to each other, starting at X=1 (pixel pairs will be flipped by NEO-ZMC2):

0100,0908,1110,1918,2120

As sprite lines always take 16mclk to render, there's an LD* pulse every 16mclk to set the new starting address (X position) except for chained sprites. There's also always an unique pulse just before output to reset the pointers to 0.

WE signals

WE1~4 signals are used to tell if the pixel should be written to a buffer.

During rendering, the pulses are synchronized to CK signals.

  • If the pixel is opaque, there are both pulses at the same time (write pixel).
  • If the pixel is transparent, there is a CK pulse but no WE pulse (skip pixel, move to next one).
  • If the pixel is skipped for horizontal shrink, there are no pulses at all (do nothing).

During output, the pulses are also synchronized to CK signals and always present. This is used to clear the buffers to the backdrop color for the next rendering cycle.

SS signals

The SS1/2 signals enable clearing of buffer pairs, active during output.

Others

  • TMS0 is used to flip the buffers, related to the lowest bit of the raster counter.
  • The rising edge of PCK1 and PCK2 latches fix or sprite pixels from the cart ROMs.

Fix data is read 8 pixels in advance (32mclk, confirms what Charles wrote in mvstech.txt).