🔍 NeuraSearch: Comprehensive Test Analysis

We have conducted three rigorous tests to validate the NeuraSearch Hybrid architecture: the Baseline Comparison, the Stress Test, and the Grand Challenge. The results confirm that the hybrid approach successfully bridges the gap between Semantic and Symbolic search.

1. Baseline Comparison (Lexical vs. Semantic)

Goal: Prove that the model understands meaning without losing exact wording.

We compared three models on a "Bag of Words" task involving subject/object reversal (dog bites man vs man bites dog) and semantic noise (cat chases mouse).

# Test texts
text1 = "dog bites man"
text2 = "man bites dog"
text3 = "cat chases mouse"

# 1. Symbolic (BOW equivalent)
vecs_sym = get_embeddings([text1, text2, text3], weight=0.0)
if vecs_sym:
    sym1, sym2, sym3 = vecs_sym

    print("Symbolic Similarity:")
    print("  (text1, text2):", cos_sim(sym1, sym2)) # Should be high (same words)
    print("  (text1, text3):", cos_sim(sym1, sym3)) # Should be low
    print("  (text2, text3):", cos_sim(sym2, sym3)) # Should be low

# 2. Semantic (GTE equivalent)
vecs_sem = get_embeddings([text1, text2, text3], weight=1.0)
if vecs_sem:
    sem1, sem2, sem3 = vecs_sem

    print("\nSemantic Similarity:")
    print("  (text1, text2):", cos_sim(sem1, sem2)) # Might be lower than Symbolic (different meaning)
    print("  (text1, text3):", cos_sim(sem1, sem3)) 
    print("  (text2, text3):", cos_sim(sem2, sem3))

# 3. Hybrid
vecs_hyb = get_embeddings([text1, text2, text3], weight=0.5)
if vecs_hyb:
    hyb1, hyb2, hyb3 = vecs_hyb

    print("\nHybrid Similarity:")
    print("  (text1, text2):", cos_sim(hyb1, hyb2))
    print("  (text1, text3):", cos_sim(hyb1, hyb3))
    print("  (text2, text3):", cos_sim(hyb2, hyb3))

RESULTS:

Symbolic Similarity:
  (text1, text2): 0.9999999999999998
  (text1, text3): -0.030849246651366142
  (text2, text3): -0.030849246651366142

Semantic Similarity:
  (text1, text2): 0.7279804462069049
  (text1, text3): 0.3260863277650705
  (text2, text3): 0.3944543095699924

Hybrid Similarity:
  (text1, text2): 0.8662364512414246
  (text1, text3): 0.1600136337045244
  (text2, text3): 0.1991729882339467

Model	Identity Score	Noise Score	Verdict
Symbolic (`0.0`)	1.00	-0.03	Too Rigid. Perfect at filtering noise, but thinks "man bites dog" is identical to "dog bites man" because the words are the same.
Semantic (`1.0`)	0.73	0.33	Too Noisy. Correctly identifies the meaning difference, but sees 33% similarity with "cat chases mouse" (irrelevant animal aggression).
Hybrid (`0.5`)	0.87	0.16	Optimal. It recognizes the lexical identity (0.87) but cuts the semantic noise in half (0.16), significantly improving precision.

2. Sample Dataset: Tech and Science snippets

# Sample Dataset: Tech & Science Snippets
documents = [
    "Machine learning is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.",
    "Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.",
    "Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language.",
    "The transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. It is the foundation of Large Language Models like GPT.",
    "Vector databases are databases that allow for the storage and retrieval of vector embeddings. They are essential for semantic search applications.",
    "Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.",
    "Rust is a multi-paradigm, general-purpose programming language. Rust emphasizes performance, type safety, and concurrency.",
    "A hypervector is a high-dimensional vector, typically with 10,000+ dimensions in classic HDC, but NeuraLex uses 1024 dimensions with dense representations.",
    "Symbolic AI is the term for the collection of all methods in artificial intelligence research that are based on high-level 'symbolic' (human-readable) representations of problems, logic and search.",
    "Hybrid search combines keyword-based search (like BM25) with vector-based semantic search to improve retrieval accuracy."
]

print(f"Processing {len(documents)} documents...")

dataset_chunks = []
dataset_vectors = []

for doc in documents:
    chunks, vectors = create_hybrid_embeddings(doc, chunk_size=50, alpha=0.5)
    dataset_chunks.extend(chunks)
    dataset_vectors.extend(vectors)

dataset_vectors = np.array(dataset_vectors)
print(f"Created {len(dataset_vectors)} hybrid vectors.")
    
# Run sample queries
queries = [
    "artificial intelligence language",
    "programming languages for performance",
    "combining symbols and vectors"
]

for q in queries:
    print(f"\nQuery: '{q}'")
    results = search_hybrid(q, dataset_vectors, dataset_chunks, top_k=5, alpha=0.5)
    for r in results:
        print(f"  [{r['score']:.4f}] {r['text']}")

Processing 10 documents...
Created 10 hybrid vectors.

Query: 'artificial intelligence language'
  [0.6204] Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language.
  [0.4543] Symbolic AI is the term for the collection of all methods in artificial intelligence research that are based on high-level 'symbolic' (human-readable) representations of problems, logic and search.
  [0.4110] Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.
  [0.3833] Rust is a multi-paradigm, general-purpose programming language. Rust emphasizes performance, type safety, and concurrency.
  [0.3474] Machine learning is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.

Query: 'programming languages for performance'
  [0.5746] Rust is a multi-paradigm, general-purpose programming language. Rust emphasizes performance, type safety, and concurrency.
  [0.3873] Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
  [0.3413] Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language.
  [0.2976] The transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. It is the foundation of Large Language Models like GPT.
  [0.2566] Machine learning is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.

Query: 'combining symbols and vectors'
  [0.4484] Hybrid search combines keyword-based search (like BM25) with vector-based semantic search to improve retrieval accuracy.
  [0.4236] Vector databases are databases that allow for the storage and retrieval of vector embeddings. They are essential for semantic search applications.
  [0.3574] Symbolic AI is the term for the collection of all methods in artificial intelligence research that are based on high-level 'symbolic' (human-readable) representations of problems, logic and search.
  [0.3560] A hypervector is a high-dimensional vector, typically with 10,000+ dimensions in classic HDC, but NeuraLex uses 1024 dimensions with dense representations.
  [0.3112] Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

3. The Stress Test (The "Edge Case" Gauntlet)

Goal: Stress-test the model against synonyms, exact IDs, and dangerous disambiguation failures.

stress_docs = [
    "The patient has a fracture in the femur.",          # Medical concept
    "Project code: XR-99-Delta is confidential.",       # Specific ID
    "The canine chased the mailman.",                   # Synonyms for dog
    "Apples are a type of fruit."                       # Distractor
]

print("Encoding stress test documents...")
stress_chunks = []
stress_vectors = []

# Encode with Hybrid (Alpha=0.5)
for doc in stress_docs:
    c, v = create_hybrid_embeddings(doc, chunk_size=50, alpha=0.5)
    stress_chunks.extend(c)
    stress_vectors.extend(v)

stress_vectors = np.array(stress_vectors)

# Create baselines
stress_symbolic = []
stress_semantic = []
for doc in stress_docs:
    _, sym = create_hybrid_embeddings(doc, chunk_size=50, alpha=0.0)
    _, sem = create_hybrid_embeddings(doc, chunk_size=50, alpha=1.0)
    stress_symbolic.extend(sym)
    stress_semantic.extend(sem)

stress_symbolic = np.array(stress_symbolic)
stress_semantic = np.array(stress_semantic)

# Define tricky queries
stress_queries = [
    "broken leg bone",      # Semantic match (Symbolic will fail)
    "XR-99-Delta",          # Exact match (Semantic might struggle)
    "dog attack",           # Semantic match (Symbolic will fail)
]

def run_stress_test(query):
    print(f"\nQuery: '{query}'")
    
    # Hybrid
    hyb = search_hybrid(query, stress_vectors, stress_chunks, top_k=1, alpha=0.5)
    # Symbolic
    sym = search_hybrid(query, stress_symbolic, stress_chunks, top_k=1, alpha=0.0)
    # Semantic
    sem = search_hybrid(query, stress_semantic, stress_chunks, top_k=1, alpha=1.0)
    
    # Get the text of the best match
    match_text = hyb[0]['text'] if hyb else 'None'
    
    print(f"  Target Doc: \"{match_text}\"")
    print(f"  Scores:")
    print(f"    Symbolic (Exact): {sym[0]['score'] if sym else 0:.4f}")
    print(f"    Semantic (Mean.): {sem[0]['score'] if sem else 0:.4f}")
    print(f"    Hybrid   (Both ): {hyb[0]['score'] if hyb else 0:.4f}")

for q in stress_queries:
    run_stress_test(q)

# Precision Test: Distinguishing between semantically similar but distinct entities
precision_docs = [
    "The secret code is Alpha",
    "The secret code is Beta"
]

print("\nEncoding precision test...")
p_chunks = []
p_vectors = []
for doc in precision_docs:
    c, v = create_hybrid_embeddings(doc, chunk_size=50, alpha=0.5)
    p_chunks.extend(c)
    p_vectors.extend(v)
p_vectors = np.array(p_vectors)

# Baselines
p_sym = []
p_sem = []
for doc in precision_docs:
    _, s1 = create_hybrid_embeddings(doc, chunk_size=50, alpha=0.0)
    _, s2 = create_hybrid_embeddings(doc, chunk_size=50, alpha=1.0)
    p_sym.extend(s1)
    p_sem.extend(s2)
p_sym = np.array(p_sym)
p_sem = np.array(p_sem)

query = "Alpha"
print(f"\nQuery: '{query}'")

# Encode query for each method
_, q_hyb = create_hybrid_embeddings(query, chunk_size=50, alpha=0.5)
_, q_sym = create_hybrid_embeddings(query, chunk_size=50, alpha=0.0)
_, q_sem = create_hybrid_embeddings(query, chunk_size=50, alpha=1.0)

if q_hyb and q_sym and q_sem:
    scores_hyb = np.dot(p_vectors, q_hyb[0])
    scores_sym = np.dot(p_sym, q_sym[0])
    scores_sem = np.dot(p_sem, q_sem[0])
    
    print("Scores for 'Alpha':")
    print("  'Alpha' doc | 'Beta' doc")
    print(f"  Hybrid:   {scores_hyb[0]:.4f} | {scores_hyb[1]:.4f}")
    print(f"  Symbolic: {scores_sym[0]:.4f} | {scores_sym[1]:.4f}")
    print(f"  Semantic: {scores_sem[0]:.4f} | {scores_sem[1]:.4f}")

RESULT:

Encoding stress test documents...

Query: 'broken leg bone'
  Target Doc: "The patient has a fracture in the femur."
  Scores:
    Symbolic (Exact): 0.0484
    Semantic (Mean.): 0.6850
    Hybrid   (Both ): 0.3480

Query: 'XR-99-Delta'
  Target Doc: "Project code: XR-99-Delta is confidential."
  Scores:
    Symbolic (Exact): 0.6989
    Semantic (Mean.): 0.8800
    Hybrid   (Both ): 0.7988

Query: 'dog attack'
  Target Doc: "The canine chased the mailman."
  Scores:
    Symbolic (Exact): 0.0262
    Semantic (Mean.): 0.5192
    Hybrid   (Both ): 0.2487

Encoding precision test...

Query: 'Alpha'
Scores for 'Alpha':
  'Alpha' doc | 'Beta' doc
  Hybrid:   0.6508 | 0.3019
  Symbolic: 0.5483 | -0.0585
  Semantic: 0.7825 | 0.7339

Query Type	Query	Symbolic	Semantic	Hybrid	Result
Medical Synonym	`"broken leg bone"`	`0.04` ❌	`0.68` ✅	`0.35` ✅	Semantic component successfully bridges the gap between "fracture" and "broken".
Exact ID	`"XR-99-Delta"`	`0.70` ✅	`0.88` ❓	`0.79` ✅	Symbolic provides a mathematical safety floor. While Semantic worked here, it is prone to OOV failures; Symbolic is not.
Disambiguation	`"Alpha" vs "Beta"`	`0.55` / `-0.06`	`0.78` / `0.73` ❌	`0.65` / `0.30` ✅	Critical Win: Semantic models failed to distinguish "Project Alpha" from "Project Beta" (gap of only 0.05). Hybrid created a massive 0.35 gap, ensuring the correct result is ranked first.

4. Grand Challenge: Structured Event Search

Goal: Solve the "Universal Adapter" problem—handling numbers, concepts, values, and dates simultaneously.

We indexed 4 complex event paragraphs and ran 4 distinct query types against them:

"Andrej Karpathy hosts a deep learning workshop at Stanford AI Lab on February 20th, 2026 at 10:00am. Contact (650) 555-2341 or +1-408-876-5432. Early bird tickets are $450. Visit ai.stanford.edu/workshops for details.",

"Jerry Seinfeld performs at the Comedy Cellar on January 12th, 2026 at 6:00pm. Contact (604) 555-1234 or +1-778-123-4567. General admission is $50. Visit www.comedycellar.com for more info.",

"The Los Angeles Lakers face the Golden State Warriors at Chase Center on March 14th, 2026 at 8:30pm. Tickets from $180. Call (888) 479-4667 for group sales. Visit www.nba.com/warriors/tickets for availability.",

"Senator Elizabeth Warren holds a town hall at Cambridge Public Library on April 5th, 2026 at 2:00pm. Contact (617) 555-9876 or press@warren.senate.gov for press passes. Visit www.warren.senate.gov/events for entry requirements."

Query	Type	Strategy	Success	Analysis
`(888) 479-4667`	Exact Token	Pure Symbolic	✅	Retrieved the Lakers game info solely via phone number. Semantic models often ignore digits or treat them as noise.
`standup comedy show info`	Concept	Pure Semantic	✅	Retrieved "Jerry Seinfeld / Comedy Cellar" despite zero keyword overlap with "standup" or "show".
`workshop costing $450`	Value + Concept	Hybrid	✅	Successfully combined the semantic concept "workshop" with the exact symbolic constraint "$450".
`April 5th with Elizabeth`	Date + Entity	Hybrid	✅	Locked onto the specific date and named entity "Elizabeth" to find the Town Hall.

Test Results Analysis

Our tests confirm the unique advantages of this hybrid architecture, backed by specific metrics from the "Bag of Words", "Stress", and "General Search" tests:

Noise Reduction & Precision:
- In the "Bag of Words" test, we compared irrelevant concepts like "cat chases mouse" against "dog bites man".
- Semantic Model: Produced a similarity of 0.33, creating "semantic noise" because both sentences involve animals and aggression. In large databases, this noise clutters search results.
- Symbolic Model: Correctly returned -0.03 (no overlap).
- Hybrid Model: Effectively halved the noise to 0.16, pushing irrelevant results down and significantly improving precision.
The "Bag of Words" Proof:
- Comparing "dog bites man" vs "man bites dog":
- Symbolic (1.0): Confirms pure lexical identity (same words, different order).
- Semantic (0.73): Recognizes the difference in meaning (subject/object swap).
- Hybrid (0.87): Perfectly balances the two—knowing they are lexically identical but semantically distinct.
Granular Ranking (The "Rust vs Python" Effect):
- In the general query "programming languages for performance":
- Rust scored 0.57 (Top Result) because it matches both the concept of a language AND explicitly contains the word "performance".
- Python scored 0.38 (Lower) because while it matches the concept, it lacks the specific attribute "performance".
- A pure semantic model often ranks these closer (as both are popular languages), but the Hybrid model validates that the specific keyword constraints of the user were met.
Semantic Generalization:
- In the "fracture/broken leg" stress test, the Symbolic component failed (score ~0.04), but the Hybrid Model (0.35) succeeded because the Semantic component (0.68) "knew" the medical relationship.
Symbolic Precision:
- In the "XR-99-Delta" test, while the Semantic model performed unexpectedly well (0.88), the Symbolic component provided a strong 0.70 safety floor. This ensures that even if a strange ID confuses the semantic tokenizer, the exact substring match guarantees retrieval.
Disambiguation (The "Alpha/Beta" Problem):
- The "Alpha vs Beta" test is the most critical finding.

Conclusion

NeuraSearch acts as a Universal Search Adapter. It provides the safety of exact keyword matching (for IDs and Prices) with the intelligence of semantic understanding (for Concepts and Synonyms), successfully passing all 3 tiers of validation without the need for separate search indices.

← Back to Whitepaper