Still constrained by how video works on the Atari. There's not enough RAM for a frame buffer, so you must update the video chip's registers on every scanline.
There's 76 cycles of CPU time per scanline. One of the more common routines to update a single sprite & it's color (for the common line-by-line color change in later games) takes 26 cycles
lda #SPRITEHEIGHT
dcp SpriteTemp
bcs DoDraw
lda #0
.byte $2C
DoDraw
lda (GfxPtr),Y
sta GRP0 ;+18 cycles <--- sprite image register
lda (ColorPtr),y
sta COLUP0 <--- sprite color register
For my game Stay Frosty, I used a mask overlay, which greatly increased ROM usage but only took 21 cycles
LDA (SpritePtr),y ; 5
AND (SpriteMask),y ; 5
STA GRP0 ; 3
LDA (ColorPtr),y ; 5
STA COLUP0 ; 3
PitFall 2 uses coprocessor, known as DPC (display processor chip), that knocked that down to 14 cycles
LDA DF0DATAW ; 4 <--- DPC register
STA GRP0 ; 3
LDA DF1DATA ; 4 <--- DPC register
STA COLUP0 ; 3
The ARM cartridge has support to emulate the DPC so you can use it to play Pitfall 2. We modified it to take advantage of the capabilities of the ARM cartridge, such as monitoring what's read from the cartridge so we could override the LDA Immediate mode and reduce that down to 10 cycles.
LDA #<DF0DATAW ; 2
STA GRP0 ; 3
LDA #<DF1DATA ; 2
STA COLUP0 ; 3
The saved cycles can be used to do other video chip updates, giving you the ability to do better graphics than normal for the Atari. I posted sample code for DPC+, of which the last demo does 29 updates of the video chip over 2 scanlines.
http://www.atariage....-dpc-programming/