Home / Research / 003a

Phase 3A - Experiment 003

Social Observation in Evolved Connectomes: Multi-Agent Food Discrimination with Peer State Sensors

Author: Dan Battye
Date: 2026-03-14
Affiliation: Quale Project
Predecessor: Experiment 002 (single-agent food discrimination)

Abstract

We investigated whether NEAT-evolved connectomes can develop social observation behaviour, using a peer agent's visible health and nausea state to inform food choices, when two copies of the same brain operate in a shared environment. Six additional social sensors (peer directional position, peer health, peer nausea) were added to the Phase 2 agent architecture, and two agents were evaluated per scenario with sequential tick ordering that gives the second agent an observation window into the first agent's post-consumption state. Over 326 generations (fresh) and 202 generations (seeded from Phase 2), both runs achieved food discrimination comparable to Phase 2 single-agent results (safe eat rate 26-32%, bad eat rate 5-12%, discrimination ratios 2.7-5.2:1). However, because item sensory properties alone are sufficient for discrimination (as demonstrated in Phase 2), we cannot determine from these metrics whether the peer sensors contributed to the observed discrimination. The experiment establishes a baseline for multi-agent evolution but does not isolate the social observation signal. A follow-up experiment (Phase 3B) is proposed where item properties are randomised per scenario, making peer observation the only reliable discrimination strategy.

1. Introduction

1.1 Background

Phase 1 demonstrated that NEAT evolution reliably produces foraging behaviour (food-seeking, water-seeking) from survival pressure alone, validated across 10 random seeds. Phase 2 showed that evolved brains can discriminate between safe and dangerous food using observable sensory properties (colour, smell, texture), achieving a 7:1 safe-to-bad eat ratio when items have distinct property profiles.

A natural question follows: can evolved brains use social information, observing another agent's reaction to food, to improve discrimination? In biological systems, social learning is widespread. Many species avoid foods they've observed causing sickness in conspecifics, even without personal experience of the food's effects. This behaviour requires: (1) the ability to perceive another agent's state, (2) the ability to associate that state with a stimulus, and (3) the ability to modify behaviour based on that association.

1.2 Research Question

When two copies of the same evolved brain operate in a shared environment with dangerous food:

Does discrimination improve compared to Phase 2 single-agent results?
Are the peer state sensors (peer_health, peer_nausea) connected in the evolved topology?
Does the sequential tick ordering (Agent 1 acts, Agent 2 observes) produce emergent role differentiation?

1.3 Experimental Challenge

A critical confound exists: Phase 2 demonstrated that item properties alone are sufficient for discrimination. If the same item properties are used in Phase 3, brains can discriminate without ever using peer sensors. The peer sensors become redundant inputs that evolution may ignore (as it ignored the hunger sensor in Phase 1). This experiment (Phase 3A) preserves the Phase 2 item properties to establish a multi-agent baseline. A follow-up (Phase 3B) is needed with randomised item properties to isolate the social observation signal.

2. Materials and Methods

2.1 Agent Architecture

The Phase 2 body was extended with 6 social sensors:

#	Sensor	Type	Description
17	peer_nearby_n	Directional	Peer agent to the north (distance-scaled)
18	peer_nearby_s	Directional	Peer agent to the south
19	peer_nearby_e	Directional	Peer agent to the east
20	peer_nearby_w	Directional	Peer agent to the west
21	peer_health	Social	Peer's current health value (0-1)
22	peer_nausea	Social	Peer's current nausea value (0-1)

Total: 22 input nodes (16 Phase 2 sensors + 6 peer sensors), 6 output nodes.

Nausea is exclusively caused by consuming dangerous items (never by starvation). This makes peer_nausea an unambiguous social signal: if the peer is nauseous, it ate something bad.

2.2 Multi-Agent Tick Ordering

Each tick processes agents sequentially to create an observation window:

Agent A: read sensors (sees peer B's current state), brain tick, move, track encounters, consume
World: update physics (item respawn)
Agent B: read sensors (now sees Agent A's post-consumption state), brain tick, move, track encounters, consume
World: update physics
Both agents: update internal state cascade

Agent ordering (who acts first) is randomised per scenario to prevent systematic bias.

2.3 Genome Expansion

When seeding from Phase 2 checkpoints, genomes were expanded from 16 to 22 input nodes. Output node IDs shifted from 16-21 to 22-27, hidden node IDs shifted correspondingly, and 6 new input nodes (peer sensors) were inserted at IDs 16-21.

2.4 Experimental Design

Parameter	Fresh (003a)	Seeded (003b)
Population	500	500
Generations	500 (max)	500 (max)
Scenarios	5	5
Ticks	300	300
Grid	15x15	15x15
Agents	2 (same genome)	2 (same genome)
Items	Phase 2 (balanced)	Phase 2 (balanced)
Seed	42	42
Seed from	Fresh minimal	Phase 2 gen 475

Food items: 6 (2 berries, 2 cooked chicken, 2 raw chicken). Water items: 4 (2 clean, 2 dirty). Same items and deadliness as Phase 2.

2.5 Fitness Function

Same as Phase 2. Fitness is computed per agent and averaged across both agents and all scenarios. No explicit reward for social observation or peer-influenced behaviour.

3. Results

3.1 Fresh Evolution (003a)

Evolution ran for 326 generations before convergence.

Table 1: Fresh Phase 3 trajectory at key milestones.

Gen	Best Fitness	Avg Fitness	Safe Eat %	Bad Eat %	Disc Ratio	Food	Water	Bad	Survival
0	13.50	4.18	71%	67%	1.1:1	0.3	0.3	0.1	79%
50	51.69	19.24	27%	8%	3.4:1	1.4	4.9	0.7	85%
100	52.52	24.60	23%	5%	4.6:1	1.6	7.3	0.8	82%
150	52.97	26.24	21%	9%	2.3:1	1.5	8.1	0.8	82%
200	57.84	30.33	25%	10%	2.5:1	1.7	10.7	0.9	65%
250	62.48	31.32	28%	5%	5.6:1	1.8	11.2	0.9	64%
300	60.34	30.56	26%	5%	5.2:1	1.7	10.8	0.9	66%
325	55.31	29.67	27%	11%	2.5:1	1.7	10.2	0.9	69%

Peak fitness: 68.67. Final topology: 32 nodes (4 hidden), 34 connections.

3.2 Seeded Evolution (003a-seeded)

Evolution ran for 202 generations before convergence.

Table 2: Seeded Phase 3 trajectory at key milestones.

Gen	Best Fitness	Avg Fitness	Safe Eat %	Bad Eat %	Disc Ratio	Food	Water	Bad	Survival
0	58.56	26.05	39%	16%	2.4:1	2.4	8.0	1.2	66%
30	61.04	30.57	48%	10%	4.8:1	2.2	10.0	1.2	68%
60	59.98	26.01	28%	4%	7.0:1	1.9	8.3	1.0	70%
90	61.97	31.21	38%	16%	2.4:1	2.4	10.1	1.2	68%
120	59.65	32.51	29%	15%	1.9:1	2.3	10.8	1.1	68%
150	58.00	31.16	28%	12%	2.3:1	2.0	10.5	1.0	68%
180	66.82	35.03	36%	8%	4.5:1	2.3	12.1	1.1	66%
201	61.27	32.43	32%	12%	2.7:1	2.0	11.2	1.0	67%

Peak fitness: 71.28. Final topology: 33 nodes (5 hidden), 33 connections.

3.3 Cross-Phase Comparison

Table 3: Discrimination metrics across all phases.

Experiment	Agents	Best Safe Eat %	Best Bad Eat %	Best Disc Ratio	Peak Fitness
Phase 2 Fresh	1	59%	15%	7.2:1	43.34
Phase 2 Fresh (final)	1	50%	7%	7.1:1	43.34
Phase 3A Fresh	2	28%	5%	5.6:1	68.67
Phase 3A Seeded	2	48%	4%	7.0:1	71.28

Phase 3A discrimination ratios (2.5-7.0:1) fall within the range of Phase 2 single-agent results (2.0-7.2:1). The higher peak fitness in Phase 3A (68-71 vs 43) reflects the contribution of two agents consuming items rather than improved per-agent behaviour.

3.4 Survival Rate

Both Phase 3A runs show lower survival rates (64-69%) than Phase 2 (71-91%). This likely reflects the larger 15x15 grid (items are more spread out, requiring more movement and energy expenditure) and the doubled item count creating more opportunities for bad food consumption.

4. Discussion

4.1 Discrimination Without Demonstrated Social Observation

The central finding is negative for the social observation hypothesis, though not conclusively so. Both Phase 3A runs achieved food discrimination comparable to Phase 2, but this discrimination can be fully explained by item property-based discrimination (colour, smell, texture differences between safe and dangerous food) without invoking peer observation.

The discrimination ratios (2.5-7.0:1) overlap with Phase 2 ranges (2.0-7.2:1), suggesting the same mechanism is at work. The peer sensors may be entirely disconnected in the evolved topology, as the hunger sensor was in Phase 1.

4.2 Why Social Observation May Not Have Emerged

Several factors may prevent peer observation from evolving in this experimental design:

1. Item properties are sufficient. Safe food has distinct colour (0.7-0.9) and smell (0.6-0.8) compared to dangerous food (colour 0.4, smell 0.3). Evolution found this discrimination pathway in Phase 2 without peer sensors. Adding peer sensors doesn't provide additional survival value when item properties already solve the problem.

2. The observation window is narrow. Agent B only sees Agent A's state after consumption. If Agent A eats bad food, its nausea rises on the same tick. But Agent B would need to: (a) notice the nausea increase, (b) associate it with a specific food item, (c) identify that item by its properties, and (d) suppress its own eat actuator for items with those properties. This is a 4-step causal chain that must emerge from topology evolution, a much harder target than direct property-based discrimination.

3. Same-genome constraint. Both agents use the same brain. A brain that makes Agent A cautious (waits to observe before eating) also makes Agent B cautious. The "scout/observer" specialisation we hypothesised requires the same topology to produce different behaviour in different contexts (different positions, different peer states). While possible in principle, it's a more complex evolutionary target than single-agent discrimination.

4. Fitness function doesn't reward observation. There's no explicit reward for using peer information. The only pressure is survival-based: agents that avoid bad food live longer. But since item properties already enable avoidance, the marginal benefit of peer observation is small.

4.3 Phase 3B: Isolating the Social Signal

To test whether social observation can emerge from connectome evolution, we need an experiment where item properties alone are not sufficient for discrimination. Proposed design for Phase 3B:

Randomise item properties each scenario. In each scenario, the mapping between observable properties and hidden effects is shuffled. Raw chicken might have colour=0.9 in one scenario and colour=0.2 in another. The agent cannot learn a fixed "avoid items with colour 0.4" strategy because the mapping changes every scenario.

The only consistent signal across scenarios is: if your peer eats something and gets nauseous, that specific item (identified by its position, not its properties) is dangerous. This makes peer observation the only viable discrimination strategy.

4.4 Fresh vs Seeded (Consistent with Phase 2)

Consistent with Phase 2 findings, the seeded run starts with higher fitness (58 vs 13) but produces less stable discrimination that oscillates between 4% and 16% bad eat rate. The fresh run's discrimination is more consistent (5-11% bad eat rate). Seeded topology is again more complex without benefit (33-70 connections vs 27-37).

The Phase 2 finding that "fresh evolution outperforms seeded for discrimination" appears to hold in the multi-agent setting as well.

4.5 Limitations

No topology inspection. Due to checkpoint directory collision, we could not inspect the Phase 3 best genome's wiring to determine whether peer sensors are connected. This is the most significant limitation, as we cannot distinguish "peer sensors unused" from "peer sensors used but metrics don't show it."
Single seed. Both runs used seed 42.
Confounded discrimination signal. Item properties are sufficient for discrimination, making peer observation redundant. This experimental design cannot distinguish property-based from peer-based discrimination.
Checkpoint management. Multiple experiments sharing the same checkpoint directory caused data loss. Future experiments should use per-experiment checkpoint directories.

5. Conclusions

Multi-agent evolution works. Two copies of the same brain operating in a shared environment with sequential tick ordering produces viable agents with food discrimination. The implementation (genome expansion, agent ordering randomisation, independent brain state per agent) is validated.
Discrimination is comparable to Phase 2. Phase 3A ratios (2.5-7.0:1) match Phase 2 single-agent results (2.0-7.2:1), suggesting the same item property-based mechanism rather than social observation.
Social observation is not demonstrated. The experiment cannot distinguish property-based discrimination from peer-observation-based discrimination because item properties are sufficient on their own.
Phase 3B is needed. Randomising item properties per scenario would isolate the social observation signal by making item properties unreliable and peer reaction the only consistent cue.
Fresh evolution remains more stable than seeded for discrimination tasks, consistent with Phase 2.

6. Proposed Phase 3B Design

Parameter	Value
Grid	15x15
Agents	2 (same genome)
Population	500
Items	Same types, but properties RANDOMISED per scenario
Key change	Observable properties (colour, smell, texture) shuffled each scenario so the agent cannot learn fixed property-based discrimination
What this tests	Whether brains evolve to use peer_nausea as the discrimination signal when item properties are unreliable
Expected difficulty	Very high; may require 1000+ generations, plasticity, or recurrent connections

References

Experiment 001: "Emergent Survival Behaviour from Evolved Connectome Topologies" (tests/001/research-outcome.md)
Experiment 002: "Evolved Food Discrimination from Connectome Topology" (tests/002/research-outcome.md)
Stanley, K.O. and Miikkulainen, R. (2002). "Evolving Neural Networks through Augmenting Topologies." Evolutionary Computation, 10(2), 99-127.

Appendix A: Data Files

File	Description
fresh/output.txt	Fresh evolution raw output (326 generations)
seeded/output.txt	Seeded evolution raw output (202 generations)
research-outcome.md	This document

Appendix B: Reproduction

go build -o quale .

# Phase 3A fresh
./quale --phase 3 --population 500 --generations 500 --seed 42 \
  --scenarios 5 --ticks 300 > tests/003/fresh/output.txt

# Phase 3A seeded from Phase 2
./quale --phase 3 --population 500 --generations 500 --seed 42 \
  --scenarios 5 --ticks 300 \
  --seed-from checkpoints/phase2/checkpoint_gen475.quale-ckpt \
  > tests/003/seeded/output.txt