Rendering logic: Difference between revisions
mNo edit summary |
m (→Sprite tiles) |
||
(14 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
Depending on the chipset, video is generated by 3, 2 or one unique chip: | |||
* [[LSPC-A0]], [[PRO-B0]] (early) | * 3: [[LSPC-A0]], [[PRO-B0]], [[PRO-C0]] (early) | ||
* [[LSPC2-A2]], [[NEO-B1]] (most common) | * 2: [[LSPC2-A2]], [[NEO-B1]] (most common) | ||
* [[NEO-GRC | * 1: [[NEO-GRC]] (CD systems), [[NEO-GRZ]] (CDZ, MV-1C...) | ||
See [[graphics pipeline]] for an overview of the interconnections between chips and cartridges. | See [[graphics pipeline]] for an overview of the interconnections between chips and cartridges. See [[Display timing]] for the sync signal's timing. | ||
There are two main parts in generating video: | |||
* | * An address generator (LSPC), which queries the graphics ROMs in the cartridges according to the data set in [[VRAM]]. | ||
* Line buffers, to which pixels can be written in any order from the graphics ROMs data. | |||
* | |||
=Line buffers= | |||
To render sprites, the NeoGeo uses a pair of line buffers which are each 320 pixels long (a whole scanline). When one is used for rendering, the other one is shifted out for video output. Each new scanline, the buffers are flipped. This can be seen as a kind of double-buffering, allowing pixels to be rendered in any order. | |||
To increase bandwidth, pixels are rendered two by two in sub-pairs: there are actually 4, 160-pixels-long buffers interleaved in an odd/even fashion. This scheme was inherited from the [[Alpha68k]]. | |||
The fix layer pixels are rendered in real time over the buffers output. | |||
==Fix tiles== | |||
* Fix pixels are output in time with the pixel clock (6MHz, 4mclk). | |||
* One fix tile line is therefore output in: 8 pixels * 4mclk = 32mclk. No variation. | |||
* The S ROM outputs 8 bits at a time so: 8 bits / 4bpp = 2 pixels at a time. | |||
* S ROM reads needed for one fix tile line: 8 pixels / 2 pixels per read = 4 reads. | |||
Address sequence for one tile line: | |||
{|class="wikitable" | |||
! A4 !! 2H1 !! pixel pair | |||
|- | |||
| 1 || 0 || A | |||
|- | |||
| 1 || 1 || B | |||
|- | |||
| 0 || 0 || C | |||
|- | |||
| 0 || 1 || D | |||
|} | |||
2H1 bypasses the PCK* latchs. | |||
* | |||
* | ==Sprite tiles== | ||
* | 16mclk = 16 pixels, 8 pixels per read. | ||
* Sprite pixels are rendered two-by-two at 12MHz (2mclk). | |||
* One sprite tile line is therefore rendered in: 16 pixels / 2 * 2mclk = 16mclk. No variation, even if shrinking is used. | |||
* The C ROM outputs 2 * 16 = 32 bits at a time so: 32 bits / 4bpp = 8 pixels at a time. | |||
* C ROM reads needed for one sprite tile line: 16 pixels / 8 pixels per read = 2 reads. | |||
Address sequence for one tile line: | |||
{|class="wikitable" | |||
! CA4 !! 8-pixel line | |||
|- | |||
| 1 || A | |||
|- | |||
| 0 || B | |||
|} | |||
CA4 bypasses the PCK* latchs. | |||
=Active lists= | |||
The NeoGeo uses a pair of active lists, where the sprites numbers which need to be rendered on the next scanline are written to. As with the line buffers, the active lists are swapped every new scanline so that one is being filled by parsing, the other one is used for rendering. | |||
They are located in the fast VRAM at addresses $8600 and $8680. Each list is 96-entries long. | |||
=Slow VRAM access slots= | |||
Slow VRAM has four 4mclk-long access slots running in sequence with no variations: | |||
# Read sprite map even word | |||
# Read sprite map odd word | |||
# Read fix map | |||
# Read/Write for CPU | |||
=Fast VRAM access slots= | |||
Fast VRAM is more complex and faster. It has 10 access slots with varying widths running in sequence with no variations, which can be seen as 5 parsing slots and 5 rendering slots: | |||
{|class="wikitable" | |||
! Slot # !! Duration !! Description | |||
|- | |||
|1 | |||
|2mclk | |||
|rowspan=5|Parsing | |||
|- | |||
|2 | |||
|1.5mclk | |||
|- | |||
|3 | |||
|1.5mclk | |||
|- | |||
|4 | |||
|1.5mclk | |||
|- | |||
|5 | |||
|1.5mclk | |||
|- | |||
|6 | |||
|2mclk | |||
|Read active list | |||
|- | |||
|7 | |||
|1.5mclk | |||
|Read SCB2 | |||
|- | |||
|8 | |||
|1.5mclk | |||
|Read SCB3 | |||
|- | |||
|9 | |||
|1.5mclk | |||
|Read SCB4 | |||
|- | |||
|10 | |||
|1.5mclk | |||
|Read/write for CPU | |||
|} | |||
Yellow are parsing cycles, purple is the active list read, green is SCB* reads for rendering, red is for CPU access: | |||
[[file:timing_gpu1.png]] | |||
The parsing cycles aren't consistent, they depend on the matching of sprites. One cycle will read from SCB3 to test if its Y position matches with the current raster line. If there's a match, the next cycle will be a write to the active list. Otherwise it's another read cycle. | |||
Fast VRAM must be fast enough (45ns) as the shortest slots are 1.5mclk (62.5ns). 1mclk (41.6ns) would be too fast and SRAM was already expensive. | |||
=Sprite parsing= | =Sprite parsing= | ||
<span style="color:#FF0000">This is a draft. The following information shouldn't be considered as | <span style="color:#FF0000">This is still a draft. The following information shouldn't be considered as correct.</span> | ||
LSPC splits the workload needed to render sprites in two passes: parsing and rendering. | |||
* Parsing for a raster line N is done during line N-2 | |||
* Rendering is done during line N-1 | |||
* Finally the line is ready for output just at the right time. | |||
During parsing, the Y positions of 381 sprites are read to see if they will be visible on line N. If that's the case, the sprite number is written to the active list currently being filled. This goes on until 381 sprites were parsed, OR the active list is full (96 sprite numbers were written), whichever comes first. | |||
* If sprite #382 is reached, the remaining time is used to fill the active list up to 96 entries with zeros. | |||
* If the active list is full, sprites are still parsed up to #382 but no writes are done to the active list, whatever the matching result. | |||
No matter how many sprites are matched in the scanline, there will always be 381 SCB3 reads and 96 active list writes. | |||
This explains why sprite #0 cannot be used: this is the value used to top-up the active list. If there are less than 96 sprite matches (like most of the time), the sprite #0 will be rendered over and over again until the end of the list is reached. | |||
* | In the next paragraphs, each character represents a parsing slot: R is an SCB3 read, W is a sprite number write to the active list, F is a filling write to the active list, - is just idle waiting. There are always 1536mclk per line / 16mclk per cycle * 5 slots per cycle = 480 slots. | ||
==Case 1: Not a single sprite match== | |||
Fast VRAM cycles: | |||
<pre> | |||
24M _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ | |||
Addr 200 | 201 | 202 | 203 | 204 | 681 | 00E | 20E | 40E | 600 | 205 | 206 | 207 | 208 | 209 | 682 | 00F | 20F | 40F | |||
R/W Read Read Read Read Read Read Read Read Read Read | |||
</pre> | |||
381 read slots, 96 fill slots, 3 waiting slots: | |||
<pre> | |||
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | |||
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | |||
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | |||
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRFFFFFFFFFFFFFFFFFFF | |||
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF--- | |||
</pre> | |||
==Case 2: Some sprites match== | |||
Fast VRAM cycles: | |||
<pre> | |||
24M _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ | |||
Addr 200 | 201 | 600 | 202 | 203 | 681 | 00E | 20E | 40E | 600 | 601 | 204 | 205 | 602 | 206 | 682 | 00F | 20F | 40F | |||
R/W Read Read Write Read Read Write Read Read Write Read | |||
</pre> | |||
If 17 sprites match: 381 read slots, 17 write slots, 96-17=79 fill slots, 3 waiting slots: | |||
<pre> | |||
RRRRRRWRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRWRRRRRRWRRRRRRWRRRRRWRWRWRRRRRRRRRWRRRRRRRRRRRRRRWRRRRRRRRRRRRR | |||
RRRRRRRRRRRRRRRRRRRWRWRRRRWRWRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | |||
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | |||
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRFF | |||
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF--- | |||
</pre> | |||
==Case 3: Exactly 96 sprites match== | |||
Fast VRAM cycles: | |||
<pre> | <pre> | ||
24M _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ | |||
Addr 200 | 600 | 201 | 601 | 202 | 681 | 00E | 20E | 40E | 600 | 602 | 203 | 204 | 603 | 604 | 682 | 00F | 20F | 40F | |||
24M | R/W Read Write Read Write Read Write Read Read Write Read | ||
Addr | |||
Read | |||
</pre> | </pre> | ||
381 read slots, 96 write slots, 0 fill slots, 3 waiting slots: | |||
<pre> | |||
RRRRRRWRRRRWRRRRRRRWRRRRRWRWRWRRWRRRRRRRRRWRRRRRRWRRWRRRRWRRRRRWRWRWRRRRRWRRRRWRRRRRRRRWRRRRRRWRRRWR | |||
WRWRRWRRRRRRRWRRRWRRRWRRRRWRRRRRRRRWRWRRRRWRWRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRRRRRRWRRRWRWRWRWRRWRW | |||
RRRRWRRWRRWRRRWRRRWRRWRRRRWRWRWRRWRRRWRWRWRRWRRRRWRRRRRWRRWRRWRRRRRRWRRRWRRRRWRRRRRWRRWRRRWRRRWRRRRR | |||
WRWRWRWRWRWRRRWRRRWRRRRRRRWRRRWRWRRRWRRRRRRRRRWRRRRRRRWRWRWRWRRRRRRRRRRWRRWRRRRRRWRRRRRRRRRRRRRRRRRR | |||
RRRRRRRRRRRRRRRWRRRWRRRRWRRRWRRRRRRRWRRRRRRRWRRRRWRWRRRWRRWRRRRWRRRRRWRRRRRWR--- | |||
</pre> | |||
==Case 4: More than 96 sprites match== | |||
Same as case 3, except after 96 "W"s, there are only useless "R"s. | |||
==Rendering== | |||
# Read active list ($8600+ or $8680+) to get sprite # | |||
# Read SCB2 zoom values ($8000+) | |||
# Read SCB3 Y position, height, and chain bit ($8200+) | |||
# Read SCB4 X position ($8400+) | |||
The tile # and its attributes are also read from slow VRAM. | |||
==CPU access to VRAM== | |||
SNK says min. 12 68kclk between writes (so 24mclk). 1 write every 24mclk = 64 per scanline. | SNK says min. 12 68kclk between writes (so 24mclk). 1 write every 24mclk = 64 per scanline. | ||
CPU access occurs asynchronously with the 68000 bus -> storage in LSPC. If no write is requested, then the slots are occupied by reads, effectively updating one of the two read buffers continuously with the value pointed by the last used VRAM address. | |||
=Buffers control= | |||
==CK signals== | |||
CK1~4 signals are used to clock each of the 4 buffers. | |||
* During rendering, the pulses often go by pair (1+2 or 3+4) to render pixels 2 by 2 if the corresponding WE signal is asserted (opaque pixel). Horizontal shrinking causes pulses to be skipped, so that the buffer's address isn't incremented. | |||
* During output, the pulses are slower and always alternate (1/2/1/2... or 3/4/3/4...) to output even/odd pixels in sequence. | |||
* If the corresponding LD* signal is high, the buffer pointer is incremented (rendering left to right). | |||
* If the corresponding LD* signal is low, the buffer pointer is loaded from the [[P bus]] (X position of sprite, or 0 to start line output). | |||
Inactive during H-blank. | |||
==LD signals== | |||
The LD1~2 signals are synchronous signals used to load the pointers for a buffer pair as two bytes. | |||
Example P bus values for 5 full-width sprites right next to each other, starting at X=0: | |||
<pre>0000,0808,1010,1818,2020</pre> | |||
Example P bus values for 5 full-width sprites right next to each other, starting at X=1 (pixel pairs will be flipped by NEO-ZMC2): | |||
<pre>0100,0908,1110,1918,2120</pre> | |||
As sprite lines '''always''' take 16mclk to render, there's an LD* pulse every 16mclk to set the new starting address (X position) '''except''' for chained sprites. There's also always an unique pulse just before output to reset the pointers to 0. | |||
==WE signals== | |||
WE1~4 signals are used to tell if the pixel should be written to a buffer. | |||
During rendering, the pulses are synchronized to CK signals. | |||
* If the pixel is opaque, there are both pulses at the same time (write pixel). | |||
* If the pixel is transparent, there is a CK pulse but no WE pulse (skip pixel, move to next one). | |||
* If the pixel is skipped for horizontal shrink, there are no pulses at all (do nothing). | |||
During output, the pulses are also synchronized to CK signals and always present. This is used to clear the buffers to the backdrop color for the next rendering cycle. | |||
==SS signals== | |||
The SS1/2 signals enable clearing of buffer pairs, active during output. | |||
==Others== | |||
* TMS0 is used to flip the buffers, related to the lowest bit of the raster counter. | |||
* The rising edge of PCK1 and PCK2 latches fix or sprite pixels from the cart ROMs. | |||
Fix data is read 8 pixels in advance (32mclk, confirms what Charles wrote in mvstech.txt). | |||
[[Category:Video system]] | [[Category:Video system]] |
Latest revision as of 21:44, 5 December 2018
Depending on the chipset, video is generated by 3, 2 or one unique chip:
- 3: LSPC-A0, PRO-B0, PRO-C0 (early)
- 2: LSPC2-A2, NEO-B1 (most common)
- 1: NEO-GRC (CD systems), NEO-GRZ (CDZ, MV-1C...)
See graphics pipeline for an overview of the interconnections between chips and cartridges. See Display timing for the sync signal's timing.
There are two main parts in generating video:
- An address generator (LSPC), which queries the graphics ROMs in the cartridges according to the data set in VRAM.
- Line buffers, to which pixels can be written in any order from the graphics ROMs data.
Line buffers
To render sprites, the NeoGeo uses a pair of line buffers which are each 320 pixels long (a whole scanline). When one is used for rendering, the other one is shifted out for video output. Each new scanline, the buffers are flipped. This can be seen as a kind of double-buffering, allowing pixels to be rendered in any order.
To increase bandwidth, pixels are rendered two by two in sub-pairs: there are actually 4, 160-pixels-long buffers interleaved in an odd/even fashion. This scheme was inherited from the Alpha68k.
The fix layer pixels are rendered in real time over the buffers output.
Fix tiles
- Fix pixels are output in time with the pixel clock (6MHz, 4mclk).
- One fix tile line is therefore output in: 8 pixels * 4mclk = 32mclk. No variation.
- The S ROM outputs 8 bits at a time so: 8 bits / 4bpp = 2 pixels at a time.
- S ROM reads needed for one fix tile line: 8 pixels / 2 pixels per read = 4 reads.
Address sequence for one tile line:
A4 | 2H1 | pixel pair |
---|---|---|
1 | 0 | A |
1 | 1 | B |
0 | 0 | C |
0 | 1 | D |
2H1 bypasses the PCK* latchs.
Sprite tiles
16mclk = 16 pixels, 8 pixels per read.
- Sprite pixels are rendered two-by-two at 12MHz (2mclk).
- One sprite tile line is therefore rendered in: 16 pixels / 2 * 2mclk = 16mclk. No variation, even if shrinking is used.
- The C ROM outputs 2 * 16 = 32 bits at a time so: 32 bits / 4bpp = 8 pixels at a time.
- C ROM reads needed for one sprite tile line: 16 pixels / 8 pixels per read = 2 reads.
Address sequence for one tile line:
CA4 | 8-pixel line |
---|---|
1 | A |
0 | B |
CA4 bypasses the PCK* latchs.
Active lists
The NeoGeo uses a pair of active lists, where the sprites numbers which need to be rendered on the next scanline are written to. As with the line buffers, the active lists are swapped every new scanline so that one is being filled by parsing, the other one is used for rendering.
They are located in the fast VRAM at addresses $8600 and $8680. Each list is 96-entries long.
Slow VRAM access slots
Slow VRAM has four 4mclk-long access slots running in sequence with no variations:
- Read sprite map even word
- Read sprite map odd word
- Read fix map
- Read/Write for CPU
Fast VRAM access slots
Fast VRAM is more complex and faster. It has 10 access slots with varying widths running in sequence with no variations, which can be seen as 5 parsing slots and 5 rendering slots:
Slot # | Duration | Description |
---|---|---|
1 | 2mclk | Parsing |
2 | 1.5mclk | |
3 | 1.5mclk | |
4 | 1.5mclk | |
5 | 1.5mclk | |
6 | 2mclk | Read active list |
7 | 1.5mclk | Read SCB2 |
8 | 1.5mclk | Read SCB3 |
9 | 1.5mclk | Read SCB4 |
10 | 1.5mclk | Read/write for CPU |
Yellow are parsing cycles, purple is the active list read, green is SCB* reads for rendering, red is for CPU access:
The parsing cycles aren't consistent, they depend on the matching of sprites. One cycle will read from SCB3 to test if its Y position matches with the current raster line. If there's a match, the next cycle will be a write to the active list. Otherwise it's another read cycle.
Fast VRAM must be fast enough (45ns) as the shortest slots are 1.5mclk (62.5ns). 1mclk (41.6ns) would be too fast and SRAM was already expensive.
Sprite parsing
This is still a draft. The following information shouldn't be considered as correct.
LSPC splits the workload needed to render sprites in two passes: parsing and rendering.
- Parsing for a raster line N is done during line N-2
- Rendering is done during line N-1
- Finally the line is ready for output just at the right time.
During parsing, the Y positions of 381 sprites are read to see if they will be visible on line N. If that's the case, the sprite number is written to the active list currently being filled. This goes on until 381 sprites were parsed, OR the active list is full (96 sprite numbers were written), whichever comes first.
- If sprite #382 is reached, the remaining time is used to fill the active list up to 96 entries with zeros.
- If the active list is full, sprites are still parsed up to #382 but no writes are done to the active list, whatever the matching result.
No matter how many sprites are matched in the scanline, there will always be 381 SCB3 reads and 96 active list writes.
This explains why sprite #0 cannot be used: this is the value used to top-up the active list. If there are less than 96 sprite matches (like most of the time), the sprite #0 will be rendered over and over again until the end of the list is reached.
In the next paragraphs, each character represents a parsing slot: R is an SCB3 read, W is a sprite number write to the active list, F is a filling write to the active list, - is just idle waiting. There are always 1536mclk per line / 16mclk per cycle * 5 slots per cycle = 480 slots.
Case 1: Not a single sprite match
Fast VRAM cycles:
24M _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ Addr 200 | 201 | 202 | 203 | 204 | 681 | 00E | 20E | 40E | 600 | 205 | 206 | 207 | 208 | 209 | 682 | 00F | 20F | 40F R/W Read Read Read Read Read Read Read Read Read Read
381 read slots, 96 fill slots, 3 waiting slots:
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF---
Case 2: Some sprites match
Fast VRAM cycles:
24M _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ Addr 200 | 201 | 600 | 202 | 203 | 681 | 00E | 20E | 40E | 600 | 601 | 204 | 205 | 602 | 206 | 682 | 00F | 20F | 40F R/W Read Read Write Read Read Write Read Read Write Read
If 17 sprites match: 381 read slots, 17 write slots, 96-17=79 fill slots, 3 waiting slots:
RRRRRRWRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRWRRRRRRWRRRRRRWRRRRRWRWRWRRRRRRRRRWRRRRRRRRRRRRRRWRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRWRWRRRRWRWRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF---
Case 3: Exactly 96 sprites match
Fast VRAM cycles:
24M _|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_|'|_ Addr 200 | 600 | 201 | 601 | 202 | 681 | 00E | 20E | 40E | 600 | 602 | 203 | 204 | 603 | 604 | 682 | 00F | 20F | 40F R/W Read Write Read Write Read Write Read Read Write Read
381 read slots, 96 write slots, 0 fill slots, 3 waiting slots:
RRRRRRWRRRRWRRRRRRRWRRRRRWRWRWRRWRRRRRRRRRWRRRRRRWRRWRRRRWRRRRRWRWRWRRRRRWRRRRWRRRRRRRRWRRRRRRWRRRWR WRWRRWRRRRRRRWRRRWRRRWRRRRWRRRRRRRRWRWRRRRWRWRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRRRRRRWRRRWRWRWRWRRWRW RRRRWRRWRRWRRRWRRRWRRWRRRRWRWRWRRWRRRWRWRWRRWRRRRWRRRRRWRRWRRWRRRRRRWRRRWRRRRWRRRRRWRRWRRRWRRRWRRRRR WRWRWRWRWRWRRRWRRRWRRRRRRRWRRRWRWRRRWRRRRRRRRRWRRRRRRRWRWRWRWRRRRRRRRRRWRRWRRRRRRWRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRWRRRWRRRRWRRRWRRRRRRRWRRRRRRRWRRRRWRWRRRWRRWRRRRWRRRRRWRRRRRWR---
Case 4: More than 96 sprites match
Same as case 3, except after 96 "W"s, there are only useless "R"s.
Rendering
- Read active list ($8600+ or $8680+) to get sprite #
- Read SCB2 zoom values ($8000+)
- Read SCB3 Y position, height, and chain bit ($8200+)
- Read SCB4 X position ($8400+)
The tile # and its attributes are also read from slow VRAM.
CPU access to VRAM
SNK says min. 12 68kclk between writes (so 24mclk). 1 write every 24mclk = 64 per scanline.
CPU access occurs asynchronously with the 68000 bus -> storage in LSPC. If no write is requested, then the slots are occupied by reads, effectively updating one of the two read buffers continuously with the value pointed by the last used VRAM address.
Buffers control
CK signals
CK1~4 signals are used to clock each of the 4 buffers.
- During rendering, the pulses often go by pair (1+2 or 3+4) to render pixels 2 by 2 if the corresponding WE signal is asserted (opaque pixel). Horizontal shrinking causes pulses to be skipped, so that the buffer's address isn't incremented.
- During output, the pulses are slower and always alternate (1/2/1/2... or 3/4/3/4...) to output even/odd pixels in sequence.
- If the corresponding LD* signal is high, the buffer pointer is incremented (rendering left to right).
- If the corresponding LD* signal is low, the buffer pointer is loaded from the P bus (X position of sprite, or 0 to start line output).
Inactive during H-blank.
LD signals
The LD1~2 signals are synchronous signals used to load the pointers for a buffer pair as two bytes.
Example P bus values for 5 full-width sprites right next to each other, starting at X=0:
0000,0808,1010,1818,2020
Example P bus values for 5 full-width sprites right next to each other, starting at X=1 (pixel pairs will be flipped by NEO-ZMC2):
0100,0908,1110,1918,2120
As sprite lines always take 16mclk to render, there's an LD* pulse every 16mclk to set the new starting address (X position) except for chained sprites. There's also always an unique pulse just before output to reset the pointers to 0.
WE signals
WE1~4 signals are used to tell if the pixel should be written to a buffer.
During rendering, the pulses are synchronized to CK signals.
- If the pixel is opaque, there are both pulses at the same time (write pixel).
- If the pixel is transparent, there is a CK pulse but no WE pulse (skip pixel, move to next one).
- If the pixel is skipped for horizontal shrink, there are no pulses at all (do nothing).
During output, the pulses are also synchronized to CK signals and always present. This is used to clear the buffers to the backdrop color for the next rendering cycle.
SS signals
The SS1/2 signals enable clearing of buffer pairs, active during output.
Others
- TMS0 is used to flip the buffers, related to the lowest bit of the raster counter.
- The rising edge of PCK1 and PCK2 latches fix or sprite pixels from the cart ROMs.
Fix data is read 8 pixels in advance (32mclk, confirms what Charles wrote in mvstech.txt).