| seed-001 |
Rule Application |
Scope DeterminationImplication Tracing |
Medium |
A community garden has four rules: (1) Each member may reserve one plot per season. (2) Members who volunteered 20+ hours last season get priority selection. (3) No member may hold the same plot for more than two consecutive seasons. (4) If a priority member's preferred plot is held by a non-priority member entering their second consecutive season, the priority member may claim it. David is a priority member who wants Plot 7. Plot 7 is currently held by Maria, who is in her second consecutive season on Plot 7 and is also a priority member. Can David claim Plot 7? |
| seed-002 |
Rule Application |
Implication TracingScope DeterminationError Recognition |
Medium |
A shipping company charges by weight tiers: 0–5 kg at $10, 5–15 kg at $25, 15–30 kg at $45, over 30 kg at $80. Company policy states: 'Fragile items incur a 50% surcharge. Loyalty members receive a 20% discount. Discounts are applied before surcharges.' A loyalty member ships a 12 kg fragile package. The clerk charges $30. The customer disputes this. Who is correct and what should the charge be? |
| seed-003 |
Rule Application |
Scope DeterminationPremise IdentificationContradiction Detection |
Easy |
A school's dress code states: 'Students must wear closed-toe shoes at all times on campus. Exception: students may wear sandals in outdoor recreational areas during lunch period.' The school also has a safety policy: 'All students in the science wing must wear closed-toe shoes at all times, with no exceptions.' A student is eating lunch in the outdoor courtyard adjacent to the science wing wearing sandals. A science teacher asks the student to change shoes because she is 'near the science wing.' Is the teacher's request supported by the rules? |
| seed-004 |
Rule Application |
Scope DeterminationPremise IdentificationImplication Tracing |
Hard |
A company's travel policy states: (1) All flights must be booked in economy class. (2) Exception: flights over 6 hours may be booked in business class with VP approval. (3) Exception to Exception: even flights over 6 hours must be economy if the trip is to an internal company office. (4) All exceptions require written pre-approval. An employee needs to fly from New York to London (7 hours) for a client meeting at the client's office, which happens to be in the same building as the company's London office. The employee books business class with VP approval. Finance rejects the expense. Who is correct? |
| seed-005 |
Rule Application |
Implication TracingScope Determination |
Easy |
A library charges late fees as follows: $0.25/day for books, $1.00/day for DVDs, and $0.50/day for audiobooks. The policy also states: 'Late fees are capped at the replacement cost of the item. No late fees accrue during library closures.' A patron returned a book 60 days late. The book's replacement cost is $12. The library was closed for 10 of those 60 days due to renovation. What is the correct late fee? |
| seed-006 |
Rule Application |
Scope DeterminationPremise IdentificationImplication Tracing |
Medium |
An apartment complex has a pet policy: (1) Tenants may keep one pet under 30 pounds. (2) No reptiles or exotic animals. (3) Service animals are exempt from all pet restrictions. (4) Emotional support animals require a doctor's letter and are exempt from the weight limit but not the species restriction. A tenant has a doctor's letter for an emotional support Burmese python. The tenant argues the snake is an emotional support animal and therefore permitted. Management disagrees. Who is correct? |
| seed-007 |
Rule Application |
Implication TracingDecomposition |
Medium |
A competition has these scoring rules: (1) Each judge scores 1–10. (2) The highest and lowest scores are dropped. (3) The remaining scores are averaged. (4) In case of a tie, the competitor with the higher raw average (all scores included) wins. (5) If still tied, the competitor who received fewer scores below 5 wins. Competitor A receives scores: 9, 8, 7, 3, 10. Competitor B receives scores: 8, 8, 7, 7, 8. Who wins? |
| seed-008 |
Rule Application |
Scope DeterminationEvidence WeighingConfidence Calibration |
Hard |
A homeowners association has a fence rule: 'Fences in front yards must not exceed 4 feet in height. Fences in back yards may be up to 6 feet. Corner lots are considered to have two front yards.' The HOA also has a variance process: 'Homeowners may apply for a variance if strict application of a rule causes undue hardship.' A corner-lot owner has a pool in what they consider their back yard (the side facing the less-trafficked street). They want a 6-foot privacy fence around the pool for child safety. The HOA says 4-foot maximum because it's technically a second front yard. The homeowner applies for a variance citing child safety. Should the variance be granted? |
| seed-009 |
Rule Application |
DecompositionScope DeterminationImplication Tracing |
Medium |
A company has an expense policy: 'Meals with clients are reimbursable up to $75 per person. Meals with colleagues during business travel are reimbursable up to $50 per person. Alcohol is not reimbursable under any circumstances.' An employee takes a client and two colleagues to dinner during a business trip. The total bill is $340: $240 for food and $100 for wine. How should the expense be processed? |
| seed-010 |
Rule Application |
Premise IdentificationScope DeterminationContradiction Detection |
Hard |
A relay race has these rules: (1) Each team has 4 runners. (2) The baton must be exchanged within a 20-meter exchange zone. (3) If the baton is dropped, the runner who dropped it must pick it up. (4) A team is disqualified if the baton exchange occurs outside the exchange zone. (5) Runners must stay in their assigned lanes during the exchange. Team Alpha's third runner drops the baton just as she enters the exchange zone. The baton rolls into the adjacent lane. She steps into the adjacent lane to retrieve it, picks it up, returns to her lane, and hands it to the fourth runner still within the exchange zone. Should Team Alpha be disqualified? |
| seed-011 |
Rule Application |
Scope DeterminationPremise Identification |
Easy |
A parking garage has tiered pricing: first hour free, hours 2–4 at $5/hour, hours 5–8 at $8/hour, daily maximum of $40. The sign also states: 'Electric vehicles park free for up to 4 hours in designated EV spots only.' A driver with an electric vehicle parks in a regular spot (not EV-designated) for 6 hours because all EV spots were full. The driver argues the EV discount should still apply. What should the charge be? |
| seed-012 |
Rule Application |
DecompositionImplication Tracing |
Medium |
A grant program has these eligibility rules: (1) Applicants must be nonprofits registered for at least 2 years. (2) Annual revenue must be under $500,000. (3) At least 60% of revenue must come from program services, not donations. (4) Organizations that received this grant in the prior year are ineligible. (5) Exception to Rule 4: prior recipients may reapply if they can demonstrate that the funded project achieved measurable outcomes. Organization X is a 3-year-old nonprofit with $480,000 in annual revenue, 55% from program services, and received this grant last year with documented outcomes. Is Organization X eligible? |
| seed-013 |
Rule Application |
Scope DeterminationContradiction DetectionEvidence Weighing |
Hard |
A town's noise ordinance states: 'Residential areas: no noise above 55 decibels between 10 PM and 7 AM. Commercial areas: no noise above 70 decibels between 11 PM and 6 AM. Construction is exempt between 7 AM and 6 PM on weekdays.' A property is zoned mixed-use (residential and commercial). At 10:30 PM on a Tuesday, a restaurant on the ground floor is producing 65 decibels from its kitchen exhaust. A resident above the restaurant files a complaint. Does the noise violate the ordinance? |
| seed-014 |
Rule Application |
Implication TracingScope Determination |
Medium |
A tournament uses single elimination with seeding. The rules state: (1) Higher seeds get home-field advantage. (2) If both teams have the same seed, the team with the better regular-season record hosts. (3) If records are also identical, a coin flip determines the host. (4) Reseeding occurs after each round. After Round 1, Team A (seed 3, record 14-2) and Team B (seed 6, record 14-2) both win. In Round 2, due to reseeding, both are now seed 2 in their respective brackets, and they are matched against each other. Who hosts? |
| seed-015 |
Rule Application |
Scope DeterminationEvidence WeighingPremise Identification |
Medium |
A food co-op has a return policy: 'Members may return any product within 7 days with receipt for a full refund. Perishable items may only be returned within 48 hours. No returns on items purchased during clearance sales.' A member buys a jar of artisanal honey during a clearance sale. Three days later, they open it and find it has crystallized and has an off taste. They want a return. The member argues that crystallized honey is a quality defect, not a normal return. The co-op points to the no-clearance-returns rule. Who has the stronger position? |
| seed-016 |
Rule Application |
Scope DeterminationPremise IdentificationEvidence Weighing |
Hard |
A university's grade appeal process has three rules: (1) Students must file within 10 business days of receiving the grade. (2) Appeals must demonstrate a procedural error or grading inconsistency, not simply disagreement with the grade. (3) The student must first attempt resolution directly with the instructor before filing a formal appeal. A student receives a grade on December 15. The university closes for winter break from December 20 to January 5. The student emails the instructor on December 18 but receives no response. On January 8 (the third business day after break), the student files a formal appeal without having resolved with the instructor. The university rejects the appeal for not completing Step 3. Is the rejection valid? |
| seed-017 |
Rule Application |
Scope DeterminationContradiction Detection |
Medium |
A ride-sharing service has surge pricing rules: (1) Surge multiplier activates when demand exceeds supply by 20%+ in a zone. (2) Surge multiplier is 1.5× for 20–50% excess demand, 2.0× for 50–100%, and 3.0× for 100%+. (3) Surge pricing does not apply to pre-scheduled rides booked more than 2 hours in advance. (4) Surge pricing does not apply in designated emergency zones during declared emergencies. A rider pre-scheduled a ride 3 hours in advance. At pickup time, the area is in a declared emergency zone with 150% excess demand. The app shows a 3.0× surge price. The rider disputes the charge. What should the fare be? |
| seed-018 |
Rule Application |
DecompositionImplication Tracing |
Medium |
A sports league has a salary cap of $100 million per team. The rules state: (1) All player salaries count against the cap. (2) Signing bonuses are prorated over the contract length. (3) Injured players on long-term injured reserve count for only 50% of their salary against the cap. (4) Teams exceeding the cap by the roster deadline must cut or trade players to become compliant. A team has $98 million in committed salaries. They want to sign a free agent to a 4-year, $20 million contract with a $4 million signing bonus. One of their current players ($6 million salary) just went on long-term injured reserve. Can the team sign the free agent and remain cap-compliant? |
| seed-019 |
Rule Application |
Scope DeterminationImplication TracingPremise Identification |
Medium |
A building's fire code requires: (1) Maximum occupancy of 200 persons in the main hall. (2) All exits must be unobstructed during events. (3) Events with 100+ attendees require a fire marshal on site. (4) Exception: regularly scheduled weekly events with consistent attendance under 150 do not require a fire marshal, provided the building passed its most recent annual inspection. The building's annual inspection expired 3 weeks ago, and the renewal is pending. A weekly book club with a consistent 120 attendees is meeting tonight. Is a fire marshal required? |
| seed-020 |
Rule Application |
Scope DeterminationImplication Tracing |
Easy |
A scholarship has these criteria: (1) GPA of 3.5 or higher. (2) Demonstrated financial need. (3) Full-time enrollment. (4) Students who are dependents of university employees are ineligible. (5) Exception to Rule 4: dependents of part-time university employees who have worked at the university for less than one year are eligible. A student has a 3.7 GPA, demonstrates financial need, and is full-time. Their parent is a university employee who works 20 hours per week and has been at the university for 11 months. Next month the parent will hit one year. Is the student eligible NOW? |
| seed-021 |
Rule Application |
DecompositionScope Determination |
Easy |
A country club's guest policy states: (1) Members may bring up to 2 guests per visit. (2) Guests must be accompanied by the member at all times. (3) The same guest may not visit more than 3 times per calendar month. (4) Guests are not permitted to use the pool during peak hours (12–4 PM on weekends). Member A brings Guest X on Saturday at 1 PM. Guest X has already visited twice this month. Member A is called to the front desk for a phone call and leaves Guest X at the pool for 10 minutes. How many rules has Member A violated? |
| seed-022 |
Rule Application |
Scope DeterminationDecompositionPremise Identification |
Hard |
A software license agreement states: (1) The license permits installation on up to 3 devices per user. (2) Devices must be owned or primarily used by the licensee. (3) The software may not be installed on servers or shared computing environments. (4) Educational institutions receive a site license allowing unlimited installations on institution-owned devices. (5) The site license does not extend to personal devices of students or faculty. A university (educational institution) purchases the site license. A professor installs the software on her university-issued laptop (institution-owned), her personal desktop at home, and a shared departmental workstation that 5 faculty members use. Which installations are valid? |
| seed-023 |
Rule Application |
Contradiction DetectionScope DeterminationImplication Tracing |
Medium |
A youth sports league has the following rules for team formation: (1) Teams are formed by age group: 6–8, 9–11, 12–14. (2) A child may play up one age group with parental consent and coach approval. (3) A child may NOT play down an age group under any circumstances. (4) Siblings must be placed on the same team if the parent requests it. A parent has two children: one is 8 and the other is 9. The parent requests they be on the same team. Under Rule 1, the 8-year-old belongs in the 6–8 group and the 9-year-old in the 9–11 group. How should the league resolve this? |
| seed-024 |
Rule Application |
Scope DeterminationEvidence WeighingImplication Tracing |
Hard |
A company's work-from-home policy states: (1) Employees must work from the office at least 3 days per week. (2) Exception: employees with documented medical conditions may work fully remote with HR approval. (3) Exception: employees whose role requires no physical presence may work fully remote with manager approval. (4) During company-wide 'collaboration weeks' (one week per quarter), all employees must be in the office 5 days, with no exceptions. (5) Reasonable accommodations under disability law supersede all internal policies. An employee with a documented disability that prevents commuting has HR-approved full remote status. Collaboration week arrives. The manager says the employee must come in per Rule 4. The employee says Rule 5 makes their accommodation supersede Rule 4. Who is correct? |
| seed-025 |
Rule Application |
Scope DeterminationImplication Tracing |
Easy |
A parking lot has these rules: (1) Residents get one assigned spot. (2) Visitors may park in any spot marked 'Visitor' for up to 4 hours. (3) Unregistered vehicles parked for more than 24 hours will be towed. (4) Electric vehicles may park in any open visitor spot regardless of the 4-hour limit if they are actively charging. A resident's car breaks down in their assigned spot. They borrow a friend's unregistered car and park it in a visitor spot. The car stays there for 30 hours while the resident waits for their own car to be repaired. Can the unregistered borrowed car be towed? |
| seed-026 |
Rule Application |
DecompositionContradiction DetectionEvidence Weighing |
Hard |
A conference has a speaker selection policy: (1) Proposals are scored 1–100 by a review committee. (2) Proposals scoring 70+ are accepted. (3) No more than 2 speakers from the same organization may be accepted. (4) If the organization cap forces a rejection, the lowest-scoring proposal from that organization is cut first. (5) Diversity clause: at least 30% of accepted speakers must be from underrepresented groups. Five proposals from the same organization score: 95 (underrepresented), 88, 82, 75 (underrepresented), 71. After all scoring, exactly 30% of accepted speakers are from underrepresented groups — and removing any underrepresented speaker would drop below 30%. Which proposals from this organization are accepted? |
| seed-027 |
Rule Application |
DecompositionScope DeterminationImplication Tracing |
Medium |
A store's refund policy states: 'Full refund within 30 days with receipt. Store credit within 30–60 days with receipt. No refunds or credits after 60 days. Items purchased with a gift card are refunded to a gift card only.' A customer bought an item 45 days ago using both a gift card ($30) and a credit card ($20). They have the receipt. They want a refund to their credit card. What refund should the store provide? |
| seed-028 |
Rule Application |
Premise IdentificationScope Determination |
Easy |
A landlord's lease states: (1) Rent is due on the 1st of each month. (2) A late fee of $50 applies if rent is received after the 5th. (3) If rent is not received by the 15th, the landlord may begin eviction proceedings. (4) The lease specifies that 'all notices must be delivered in writing via certified mail or hand delivery.' The tenant mails a rent check on the 3rd via regular mail. The landlord receives it on the 7th. The landlord assesses a late fee. The tenant argues the check was mailed before the 5th. Is the late fee valid? |
| seed-029 |
Rule Application |
Contradiction DetectionPremise IdentificationScope Determination |
Hard |
A school's grading policy states: (1) Final grades are calculated as: Exams 50%, Homework 30%, Participation 20%. (2) Students who miss an exam with a documented excuse receive the average of their other exam scores for the missed exam. (3) Students who miss an exam without a documented excuse receive a zero. (4) No student may take more than one makeup exam per semester. A student missed two exams. For the first missed exam, they provided a doctor's note. For the second, they had a family emergency but no documentation. Under Rule 4, the student has used their one makeup allowance. But Rule 2 doesn't mention makeup exams — it says they 'receive the average.' Are Rules 2 and 4 in conflict? |
| seed-030 |
Rule Application |
Implication TracingScope Determination |
Medium |
An airline's upgrade policy: (1) Upgrades are offered to elite members 24 hours before departure, in order of elite status tier. (2) Within the same tier, the member with the earliest booking date gets priority. (3) Upgrades are non-transferable. (4) If an elite member declines an upgrade, it passes to the next eligible member. Passenger A (Gold tier, booked January 5) and Passenger B (Gold tier, booked January 3) are both on the same flight. Passenger B is offered the upgrade first (earlier booking). Passenger B says: 'Give it to my colleague, Passenger A.' The gate agent says this violates Rule 3. Passenger B then declines the upgrade. Passenger A is next in line and receives it under Rule 4. Did the system produce the intended outcome despite Rule 3? |
| seed-031 |
Rule Application |
Scope DeterminationDecompositionEvidence Weighing |
Hard |
A condominium's renovation policy states: (1) All renovations require board approval. (2) Cosmetic changes (paint, fixtures, flooring) are automatically approved if submitted 14 days before work begins. (3) Structural changes require an engineer's report and board vote. (4) Work may not begin before 8 AM or continue after 6 PM on weekdays, and no work is permitted on weekends. (5) Emergency repairs are exempt from all scheduling and approval requirements. A unit owner discovers a burst pipe at 7 AM on Saturday. They call a plumber who arrives at 8 AM. The plumber fixes the pipe but discovers the burst was caused by a corroded support beam. The plumber reinforces the beam temporarily. The neighbor below complains about the noise on a Saturday. The building manager asks whether the beam repair was authorized. What rules apply? |
| seed-032 |
Rule Application |
DecompositionScope DeterminationImplication Tracing |
Easy |
A government benefits program has these rules: (1) Applicants must have annual income below $35,000. (2) Income includes wages, investment returns, and rental income. (3) Income does NOT include one-time gifts, inheritance, or insurance payouts. (4) Self-employment income is calculated as gross revenue minus documented business expenses. An applicant has: $28,000 in wages, $3,000 in stock dividends, a $10,000 inheritance, and $6,000 in gross revenue from a side business with $4,000 in documented expenses. What is their qualifying income? |
| seed-033 |
Rule Application |
Implication TracingScope Determination |
Medium |
A club's election bylaws state: (1) The president serves a maximum of two consecutive terms. (2) A term is one year. (3) If the president resigns mid-term, the vice president completes the term. (4) A partial term of less than 6 months does not count toward the two-term limit. (5) A partial term of 6 months or more counts as a full term. The vice president assumed the presidency on March 1 when the president resigned. The term ends December 31. The VP-turned-president then wins the next two elections (two full terms). A member challenges their eligibility for a fourth consecutive year as president. Is the challenge valid? |
| seed-034 |
Rule Application |
Scope DeterminationDecomposition |
Easy |
A warehouse has inventory rules: (1) Items expiring within 30 days must be moved to the clearance section. (2) Items past their expiration date must be destroyed. (3) Canned goods have a 'best by' date, not an expiration date — they are safe past the date but must be discounted 50% after the 'best by' date. (4) Refrigerated items must be destroyed on their expiration date regardless of condition. A case of canned tuna has a 'best by' date that was yesterday. A case of yogurt has an expiration date that is tomorrow. A case of crackers has an expiration date 25 days from now. What action should be taken for each? |
| seed-035 |
Rule Application |
Implication TracingScope DeterminationTemporal Sequencing |
Hard |
A company's promotion policy states: (1) Employees are eligible for promotion after 2 years in their current role. (2) Employees must have a performance rating of 'Exceeds Expectations' or higher for the most recent review cycle. (3) Employees with any disciplinary action in the past 12 months are ineligible. (4) Exception to Rule 3: a verbal warning that was subsequently rescinded does not count as disciplinary action. An employee has been in their role for 2.5 years with an 'Exceeds Expectations' rating. Eight months ago, they received a written warning. Six months ago, the warning was downgraded to a verbal warning after an appeal. Four months ago, the verbal warning was rescinded entirely. Is the employee eligible for promotion? |
| seed-036 |
Rule Application |
Scope DeterminationImplication Tracing |
Medium |
A summer camp has these rules: (1) Children must be 7–14 years old. (2) Children under 10 must be in the Junior Division. (3) Children 10–14 must be in the Senior Division. (4) Exception: children under 10 who pass a swimming proficiency test may be placed in the Senior Division for water activities only. (5) Children in the Senior Division attend the overnight camping trip. A 9-year-old has passed the swimming proficiency test and been placed in the Senior Division for water activities. The overnight camping trip includes both water and land activities. Can this child attend the overnight trip? |
| seed-037 |
Rule Application |
Premise IdentificationScope DeterminationImplication Tracing |
Medium |
A municipal zoning code states: (1) Residential zones prohibit commercial activity. (2) Exception: home-based businesses are permitted if they receive no customer foot traffic, produce no noise audible beyond the property line, and employ no non-resident workers. (3) Signs advertising a business are prohibited in residential zones. (4) Exception to Rule 3: a single sign no larger than 2 square feet is permitted for licensed home-based businesses. A resident runs a home-based tutoring business. Students come to the home for sessions (3–4 per day). The resident has a 1.5 square foot sign. Is this operation legal? |
| seed-038 |
Rule Application |
Temporal SequencingImplication TracingScope Determination |
Hard |
A loyalty program has these rules: (1) Members earn 1 point per $1 spent. (2) Double points during promotional periods. (3) Points expire after 12 months of account inactivity. (4) Expired points cannot be reinstated. (5) Exception: members who reach Gold status (10,000 lifetime points) have points that never expire. A member earned 9,800 points, then went inactive for 14 months. They then made a $200 purchase during a promotional period. The member claims they should be Gold status (9,800 + 400 double points = 10,200) and their original 9,800 points should be reinstated. Is the member correct? |
| seed-039 |
Rule Application |
Evidence WeighingPremise IdentificationScope Determination |
Hard |
A school district's transfer policy states: (1) Students may transfer to any school in the district if the receiving school has capacity. (2) Priority is given to students transferring for academic programs not available at their home school. (3) Students with siblings at the receiving school get second priority. (4) Athletic transfers are prohibited: no student may transfer primarily for athletic purposes. (5) A transfer request is considered 'athletic' if the student was recruited by a coach, participated in tryouts at the receiving school before transferring, or if the transfer coincides with the student being cut from a team at their home school. A student was cut from the basketball team at School A. Two months later, they request a transfer to School B, citing School B's superior robotics program (which School A does not offer). School B happens to have a strong basketball team. Is this an athletic transfer? |
| seed-040 |
Rule Application |
Scope DeterminationImplication Tracing |
Medium |
A nonprofit's conflict of interest policy states: (1) Board members must disclose any financial interest in organizations that do business with the nonprofit. (2) Board members with a disclosed conflict must recuse themselves from voting on related transactions. (3) A board member's 'financial interest' includes interests held by their spouse or dependent children. (4) Exception: holdings in publicly traded companies of less than 1% of outstanding shares are not considered a 'financial interest.' Board member Jones's spouse owns 0.8% of Company X's publicly traded stock. Company X is bidding on a $500,000 contract with the nonprofit. Must Jones recuse? |
| seed-041 |
Rule Application |
Evidence WeighingPremise IdentificationScope Determination |
Hard |
A municipality's tree ordinance states: (1) No tree with a trunk diameter of 12 inches or more may be removed without a permit. (2) Permits require an arborist's assessment and a $200 fee. (3) Dead or diseased trees are exempt from the permit requirement but must be reported within 30 days of removal. (4) Trees posing imminent danger to structures may be removed immediately, with documentation filed within 7 days. A homeowner notices a large tree leaning significantly after a storm. They believe it could fall on their house. They cut it down immediately without a permit or arborist assessment. Afterward, an arborist determines the tree was healthy and structurally sound — the lean was pre-existing and stable. Did the homeowner violate the ordinance? |
| seed-042 |
Rule Application |
Scope DeterminationImplication TracingDecomposition |
Hard |
An insurance policy states: (1) Coverage begins on the policy effective date. (2) Claims must be filed within 90 days of the incident. (3) Pre-existing conditions are not covered. (4) A pre-existing condition is defined as any condition diagnosed or treated within the 12 months before the policy effective date. (5) If a pre-existing condition worsens after the policy effective date, the worsening is covered but the underlying condition is not. A person's policy starts January 1. They were treated for back pain in March of the previous year (10 months before). In February, they experience a herniated disc that doctors say was caused by the pre-existing back condition. Is the herniated disc covered? |
| seed-043 |
Rule Application |
Scope DeterminationEvidence WeighingConfidence Calibration |
Hard |
A vendor agreement states: (1) The vendor must deliver goods by the agreed date. (2) Late delivery incurs a penalty of 2% of the contract value per day, up to a maximum of 20%. (3) Force majeure events (natural disasters, government actions, pandemics) excuse late delivery with no penalty if the vendor provides notice within 48 hours of the event. (4) The vendor must use 'commercially reasonable efforts' to mitigate delays, even during force majeure events. A vendor's factory is in a region hit by flooding. The vendor notifies the buyer within 24 hours. Delivery will be 15 days late. However, the vendor could have sourced equivalent goods from another supplier at 30% higher cost and delivered on time. Did the vendor satisfy Rule 4? |
| seed-044 |
Rule Application |
Scope DeterminationEvidence WeighingPremise Identification |
Medium |
A ride-hailing platform's rating policy states: (1) Drivers below a 4.5-star average are placed on probation. (2) Drivers on probation who don't improve to 4.5+ within 30 days are deactivated. (3) Riders may not rate below 3 stars without providing a written reason. (4) Ratings based on factors outside the driver's control (traffic, route calculated by the app, surge pricing) are removed upon appeal. A driver has a 4.48 average. Their three most recent 1-star ratings all cite 'took too long' as the reason. The driver took the route the app calculated each time. The driver appeals these ratings under Rule 4. If all three are removed, the driver's average rises to 4.55. Should the ratings be removed? |
| seed-045 |
Rule Application |
Scope DeterminationPremise Identification |
Medium |
A university honor code states: (1) Students may not submit work that is not their own. (2) Collaboration is permitted when explicitly authorized by the instructor. (3) Unauthorized collaboration is an honor code violation. (4) Tutoring and study groups are not considered 'collaboration' under this code. (5) Using AI tools to generate any portion of submitted work is prohibited unless the instructor explicitly permits it. A student uses an AI tool to generate an outline for an essay, then writes the full essay themselves using the outline as a structural guide. The instructor prohibited AI use. Did the student violate Rule 5? |
| seed-046 |
Rule Application |
Scope DeterminationAnalogical MappingImplication Tracing |
Hard |
A wildlife sanctuary has visitor rules: (1) Visitors must stay on marked trails. (2) No feeding animals. (3) Photography is permitted but flash photography is prohibited. (4) Exception to Rule 3: researchers with permits may use flash photography in designated zones. (5) All rules apply to drones — drones are treated as extensions of the visitor. A researcher with a permit flies a drone with a flash camera into a designated zone but operates the drone from the visitor center, which is NOT in the designated zone. The flash fires while the drone is in the designated zone. Is this a violation? |
| seed-047 |
Rule Application |
DecompositionScope DeterminationImplication Tracing |
Medium |
A payroll policy states: (1) Overtime applies to hours worked beyond 40 in a workweek. (2) Overtime rate is 1.5× the regular rate. (3) Paid holidays count as hours worked for overtime calculation. (4) Paid sick leave does NOT count as hours worked for overtime calculation. (5) The workweek runs Monday through Sunday. An employee works Monday through Thursday (8 hours each = 32 hours), takes paid sick leave on Friday (8 hours), and then works Saturday (10 hours). Their regular rate is $20/hour. What is their total pay for the week? |
| seed-048 |
Rule Application |
Scope DeterminationEvidence WeighingTemporal Sequencing |
Medium |
A homeowners association's architectural review states: (1) All exterior modifications must be approved by the review committee before construction begins. (2) Approved modifications must be completed within 90 days of approval. (3) Modifications not completed within 90 days require re-approval. (4) The committee meets monthly on the first Wednesday. (5) Emergency repairs that restore the property to its previous condition do not require approval. A homeowner gets approval to replace their roof on January 5. On Day 85 (March 31), a severe storm damages the partially completed new roof. The contractor needs an additional 30 days to finish — now extending to Day 115. Does the homeowner need re-approval? |
| seed-049 |
Rule Application |
Temporal SequencingImplication Tracing |
Medium |
A company's data retention policy states: (1) Customer records must be retained for 7 years after the last transaction. (2) After the retention period, records must be destroyed within 90 days. (3) Exception: records subject to a legal hold must be retained indefinitely until the hold is released. (4) When a legal hold is released, the original retention period resumes from where it was paused. A customer's last transaction was in 2016. A legal hold was placed on their records in 2020 (4 years into the 7-year retention period). The legal hold is released in 2025. When must the records be destroyed? |
| seed-050 |
Rule Application |
DecompositionImplication TracingTemporal Sequencing |
Medium |
An event venue has these booking rules: (1) Bookings require a 50% deposit at signing. (2) The remaining 50% is due 14 days before the event. (3) Cancellations more than 30 days out receive a full deposit refund. (4) Cancellations 15–30 days out receive a 50% deposit refund. (5) Cancellations less than 15 days out receive no refund. (6) The venue may cancel a booking at any time for safety concerns, with a full refund of all payments. (7) If the venue cancels and the client has already paid in full, interest of 2% per month is added to the refund for the period between payment and refund. A client books an event, pays the full amount ($10,000) 30 days early, then 10 days before the event, the venue cancels due to a structural safety concern discovered during inspection. It takes the venue 3 months to process the refund. How much does the client receive? |
| seed-051 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationConfidence Calibration |
Medium |
A company is hiring for a senior role. Three references were contacted for the finalist candidate. Reference A (former manager): 'Outstanding performer, always exceeded targets, highly recommend.' Reference B (former colleague): 'Good team player but sometimes missed deadlines under pressure.' Reference C (former direct report): 'Demanding and difficult to work for, but I learned more from them than anyone else.' The candidate's resume shows consistent promotions and a track record of being placed on high-visibility projects. How should the hiring committee weigh these conflicting assessments? |
| seed-052 |
Multi Source Conflict |
Evidence WeighingConfidence CalibrationDecomposition |
Medium |
Two weather services are forecasting for the same city on the same day. Service A predicts sunny skies with a high of 78°F, based on their proprietary atmospheric model that has been 85% accurate historically. Service B predicts thunderstorms with a high of 68°F, based on satellite imagery updated 2 hours ago and a model with 80% historical accuracy. A local farmer needs to decide whether to harvest today (which requires dry conditions) or wait until tomorrow. Which forecast should the farmer rely on? |
| seed-053 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationConfidence Calibration |
Hard |
A patient visits two doctors about persistent knee pain. Doctor A (orthopedic surgeon) examines the knee, orders an MRI, and diagnoses a torn meniscus requiring arthroscopic surgery. Doctor B (sports medicine physician) examines the same knee, reviews the same MRI, and diagnoses mild cartilage wear consistent with age, recommending physical therapy for 6 weeks before considering any procedure. Both doctors are board-certified with 15+ years of experience. The MRI report from the radiologist says: 'Possible meniscal tear versus degenerative changes; clinical correlation recommended.' How should the patient decide? |
| seed-054 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationScope Determination |
Hard |
A city council is debating whether to build a new park. Three studies have been commissioned. Study A (by a real estate firm): 'The park will increase surrounding property values by 12–18%, generating $2.3M in additional tax revenue over 10 years.' Study B (by an environmental group): 'The park will reduce urban heat island effect by 3°F, decrease stormwater runoff by 40%, and improve air quality metrics in a 1-mile radius.' Study C (by a fiscal watchdog organization): 'The park will cost $4.5M to build and $350K/year to maintain, with no reliable evidence that parks increase tax revenue — the real estate firm's methodology conflates correlation with causation.' Studies A and C directly contradict each other on the tax revenue claim. Which study should the council trust? |
| seed-055 |
Multi Source Conflict |
Evidence WeighingContradiction DetectionConfidence Calibration |
Hard |
An antique dealer is evaluating a painting. Expert A (art historian): 'The brushwork and pigment composition are consistent with the claimed 18th-century origin. I am 90% confident this is authentic.' Expert B (forensic analyst): 'Chemical analysis of the canvas shows synthetic fibers not available until the 1940s. The paint contains titanium white, which was not commercially available until the early 20th century.' Expert C (provenance researcher): 'The painting appears in the estate inventory of a known 18th-century collector, and there is a continuous chain of ownership documentation to the present day.' How should the dealer assess authenticity? |
| seed-056 |
Multi Source Conflict |
Evidence WeighingAbsence ReasoningConfidence Calibration |
Medium |
A school receives three different reports about an incident in the cafeteria. Student A: 'Student X threw food at Student Y first, then Y pushed X.' Student B: 'I didn't see who started it, but X was on the ground and Y was standing over him when I looked up.' Teacher C: 'I was across the cafeteria. I saw Student Y push Student X, but I didn't see what happened before the push.' Security camera footage shows the cafeteria from an angle that captures Y pushing X, but the area where A claims X threw food is just outside the camera's field of view. Who started the altercation? |
| seed-057 |
Multi Source Conflict |
Contradiction DetectionEvidence WeighingTemporal Sequencing |
Medium |
A company's database shows a customer placed an order on March 3. The customer's email confirmation shows March 5. The shipping provider's tracking record shows the package was picked up on March 4. The customer is disputing a late delivery fee that applies to orders placed after March 4. Which date is the actual order date? |
| seed-058 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationConfidence Calibration |
Easy |
Three sensors monitor temperature in a data center. Sensor A reads 72°F. Sensor B reads 71°F. Sensor C reads 85°F. The cooling system's log shows it has been running normally. Sensor C was last calibrated 14 months ago (calibration is required every 12 months). Sensors A and B were calibrated 3 months ago. The data center manager wants to know: is the data center overheating? |
| seed-059 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationAnalogical Mapping |
Hard |
A journalist is investigating a factory's environmental compliance. Source 1 (factory spokesperson): 'We are fully compliant with all environmental regulations. Our last inspection was passed with no violations.' Source 2 (anonymous factory worker): 'The factory dumps waste into the river at night after inspections. I've seen it happen three times in the past month.' Source 3 (state inspection records): 'Most recent inspection: passed. No violations found. Note: inspection was announced 48 hours in advance per standard protocol.' Source 4 (downstream water quality data from a university research station): 'Elevated levels of industrial contaminants detected in river samples taken on three dates in the past month. Levels exceeded EPA thresholds by 15–40%.' How should the journalist evaluate these sources? |
| seed-060 |
Multi Source Conflict |
Evidence WeighingTemporal SequencingConfidence Calibration |
Easy |
A family is planning a vacation and consults three sources for hotel recommendations. Travel blog (updated 6 months ago): 'Hotel Sunrise is the best beachfront option — beautiful rooms, excellent service, 9/10.' Review aggregator (4,200 reviews, average 3.2/5): Mixed reviews mentioning outdated rooms, noise, and inconsistent service, though many positive reviews about the beach location. Friend who visited 2 weeks ago: 'We stayed at Hotel Sunrise. The beach was gorgeous but the hotel was under renovation — half the pool area was closed, construction noise from 7 AM, and they moved us to a smaller room because our original was being remodeled.' Which source should the family trust most? |
| seed-061 |
Multi Source Conflict |
Evidence WeighingImplication TracingContradiction Detection |
Medium |
A court case involves a car accident at an intersection. Witness 1 (pedestrian on the corner): 'The light was green for the northbound car when the eastbound car ran the red and hit it.' Witness 2 (driver behind the northbound car): 'I'm not sure about the light, but the northbound car was definitely going above the speed limit.' Witness 3 (passenger in the eastbound car): 'Our light had just turned yellow. We were already in the intersection.' Traffic camera footage is available but the camera only captures the northbound approach — it shows the northbound car entering the intersection on green. The eastbound approach is not on camera. Who is at fault? |
| seed-062 |
Multi Source Conflict |
DecompositionEvidence WeighingImplication Tracing |
Medium |
A company's Q3 financial results show conflicting signals. Revenue increased 15% year-over-year. Profit margin decreased from 22% to 17%. Customer count grew by 25%. Average revenue per customer decreased by 8%. Employee satisfaction scores dropped from 78 to 64. A board member says 'Great quarter — revenue is up significantly.' The CFO says 'Concerning quarter — we're growing unprofitably.' The VP of Sales says 'Best quarter ever — customer acquisition is way up.' Who is correct? |
| seed-063 |
Multi Source Conflict |
Evidence WeighingConfidence CalibrationScope Determination |
Hard |
An archaeological dig uncovers pottery fragments at a site. Method 1 (carbon-14 dating of organic residue on the pottery): dates the pottery to approximately 800 CE ± 50 years. Method 2 (thermoluminescence dating of the clay): dates the pottery to approximately 1100 CE ± 100 years. Method 3 (stylistic analysis by a ceramics expert): 'The decorative patterns are consistent with the regional tradition from 900–1000 CE.' The site also contains coins independently dated to the 10th century. How old is the pottery? |
| seed-064 |
Multi Source Conflict |
Temporal SequencingEvidence WeighingDecomposition |
Medium |
A software team is debugging a production outage. The monitoring dashboard shows the database CPU spiked to 98% at 2:14 PM. The application logs show timeout errors starting at 2:15 PM. A developer reports they deployed a new code version at 2:10 PM. The database administrator says: 'I ran a large analytics query at 2:13 PM — it usually takes 30 seconds and never causes issues.' The network team reports no anomalies. What caused the outage? |
| seed-065 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationAnalogical Mapping |
Medium |
A market research firm surveys 500 consumers about a new product. Survey result: 72% say they would 'definitely' or 'probably' purchase the product at $29.99. A focus group of 12 consumers from the same demographic: 8 of 12 said the price was too high and they wouldn't buy it. A competitor launched a similar product last year at $34.99 and achieved 15% market penetration in the first year — below their 25% target. The product team cites the 72% survey result to justify a launch. The finance team cites the competitor's underperformance to urge caution. Who is right? |
| seed-066 |
Multi Source Conflict |
Evidence WeighingTemporal SequencingImplication Tracing |
Hard |
A city's public health department receives conflicting data about a possible foodborne illness outbreak. Hospital A reports 15 patients with similar gastrointestinal symptoms in the past 48 hours. Hospital B (across town) reports no unusual increase in GI cases. A restaurant inspection from last week found no violations at the restaurant 12 of Hospital A's patients reported eating at. The restaurant's own records show all food stored at correct temperatures. However, the restaurant's fish supplier issued a voluntary recall of a batch of shellfish yesterday — and the restaurant received a delivery from that supplier 4 days ago. Is there an outbreak, and what's the source? |
| seed-067 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationTemporal Sequencing |
Medium |
Two history textbooks describe the same event differently. Textbook A (published 1985): 'The treaty was signed under fair conditions, with both parties negotiating in good faith. It brought lasting stability to the region.' Textbook B (published 2018): 'The treaty was imposed under economic duress. While it reduced immediate conflict, it created structural inequities that fueled resentment for decades.' Both cite primary sources. Textbook A cites diplomatic correspondence between the two heads of state. Textbook B cites newly declassified internal government memos and economic records from both nations. A student must write an essay about the treaty. Which account should they use? |
| seed-068 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationConfidence Calibration |
Medium |
A building inspector and a structural engineer disagree about a house for sale. The building inspector (hired by the buyer): 'The foundation has significant cracking. I recommend a full structural assessment before purchasing.' The structural engineer (hired by the seller): 'The cracks are cosmetic settling cracks, common in houses of this age. No structural concern.' The house was built 40 years ago. The seller's disclosure form states: 'No known structural defects.' Neighboring houses of similar age and construction show similar minor cracking with no structural issues. The buyer is trying to decide whether to proceed with the purchase. How should they evaluate the conflicting opinions? |
| seed-069 |
Multi Source Conflict |
Evidence WeighingScope DeterminationDecomposition |
Easy |
A student receives conflicting feedback on the same essay from two professors. Professor A (composition): 'Your argument is well-structured and clearly written. The thesis is strong. Grade: A-. Suggestion: add more counterarguments.' Professor B (subject-matter expert): 'Your writing is fine, but the central argument relies on a misreading of the primary source. The conclusion you draw in paragraph 4 is not supported by the text you cite. Grade: C+. Suggestion: reread Chapter 7 carefully.' The student wants to revise the essay. Whose feedback should they prioritize? |
| seed-070 |
Multi Source Conflict |
DecompositionImplication TracingContradiction Detection |
Medium |
A nonprofit's annual report shows total donations of $2.4 million. Their publicly filed tax return shows total revenue of $1.8 million. Their bank statements (obtained during an audit) show total deposits of $2.1 million. A board member asks: how much money did the organization actually receive? The executive director explains that $300K in the annual report represents pledges not yet collected, and $200K in bank deposits were transfers between the organization's own accounts. Do the numbers reconcile? |
| seed-071 |
Multi Source Conflict |
Evidence WeighingTemporal SequencingPremise Identification |
Medium |
Two maps of the same forest area disagree. Map A (government topographic survey from 2019): shows a river running north-south through the western portion of the forest. Map B (satellite imagery composite from 2024): shows the river curving significantly eastward in the middle section, with the old riverbed visible but dry. A hiker planning a route needs to know where the river actually is. The hiker also has a compass and the GPS coordinates for a bridge that should cross the river. The GPS puts the bridge at a point that aligns with Map A's river path but not Map B's. Which map is correct? |
| seed-072 |
Multi Source Conflict |
Evidence WeighingDecompositionConfidence Calibration |
Hard |
A manufacturing plant tests product quality using three independent methods. Test A (automated optical inspection): passes 96% of units. Test B (manual quality control by experienced workers): passes 91% of units. Test C (destructive stress testing on a random 2% sample): fails 8% of tested units. Management notes that Test A and Test B agree on 89% of units (both pass). On 7% of units, Test A passes but Test B fails. On 2% of units, Test B passes but Test A fails. The remaining 2% both fail. Which test is most reliable for predicting actual product quality? |
| seed-073 |
Multi Source Conflict |
Scope DeterminationEvidence WeighingImplication Tracing |
Hard |
A company receives two contradictory legal opinions about whether they can use customer data for a new purpose. Lawyer A (external firm specializing in data privacy): 'Under the current privacy regulations, the original consent does not cover this new use. You need to obtain fresh consent from affected customers.' Lawyer B (in-house counsel): 'The original terms of service are broad enough to cover this use under the "legitimate business interest" provision. No additional consent is needed.' The relevant regulation states: 'Data may be processed for purposes compatible with the original purpose of collection, provided the data subject's reasonable expectations are met.' The original terms of service say: 'We may use your data to improve our products and services.' The new use is selling anonymized aggregate data to third parties. Which lawyer is correct? |
| seed-074 |
Multi Source Conflict |
Contradiction DetectionTemporal SequencingEvidence Weighing |
Medium |
A used car listing shows conflicting information. The odometer reads 45,000 miles. The Carfax report shows a reading of 38,000 miles at the last service 8 months ago. The seller's listing says '55,000 miles — well maintained.' The maintenance records show oil changes every 5,000 miles, with the most recent at 43,000 miles (3 months ago). The car is 6 years old. What is the likely actual mileage, and is anything suspicious? |
| seed-075 |
Multi Source Conflict |
Evidence WeighingConfidence CalibrationAnalogical Mapping |
Medium |
A project manager receives three estimates for completing a software feature. Developer A (senior, 10 years experience): '3 weeks.' Developer B (mid-level, 4 years experience): '6 weeks.' Developer C (senior, 12 years experience, worked on a similar feature at a previous company): '2 weeks, but could stretch to 4 if the database layer is more complex than it looks.' Historical data from the team's last 20 features shows that initial estimates average 60% of actual completion time (i.e., features take about 1.67× longer than estimated). What should the project manager plan for? |
| seed-076 |
Multi Source Conflict |
Evidence WeighingTemporal SequencingConfidence Calibration |
Medium |
A nutrition study and a government guideline disagree. A recent large-scale study (50,000 participants, 10-year follow-up, published in a top-tier journal) found that consuming 3 or more eggs per day had no statistically significant association with cardiovascular disease risk. The current government dietary guideline recommends limiting egg consumption to 1 per day, citing cholesterol concerns. An older meta-analysis (2010, covering studies from 1980–2009) found a modest association between high egg consumption and heart disease. A patient asks their doctor: how many eggs can I safely eat? |
| seed-077 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationDecomposition |
Medium |
A real estate appraiser and an online valuation tool disagree on a home's value. The appraiser (in-person inspection, licensed, 20 years experience): $425,000. The online tool (algorithm using comparable sales data, tax records, and market trends): $510,000. Three recent comparable sales in the neighborhood: $460,000, $480,000, and $520,000. The home has a significant but not immediately obvious issue: the basement floods during heavy rain, which the appraiser noted but the online tool cannot detect. The seller wants to list at $510,000. The buyer's lender requires an appraisal. What is the home likely worth? |
| seed-078 |
Multi Source Conflict |
Evidence WeighingDecompositionImplication Tracing |
Hard |
A manager receives conflicting performance signals about an employee. Quantitative metrics: the employee hit 112% of their sales target and closed 3 of the team's 5 largest deals. Peer feedback (360 review, anonymous): 4 of 6 peers rated working with this employee as 'difficult' or 'very difficult,' citing 'takes credit for team work,' 'dismissive in meetings,' and 'withholds information.' Client feedback: two major clients specifically requested this employee for future projects. The employee is up for promotion to team lead. Should they be promoted? |
| seed-079 |
Multi Source Conflict |
Scope DeterminationEvidence WeighingDecomposition |
Hard |
A policy proposal to raise the minimum wage is supported by two economic studies and opposed by two others. Study 1 (FOR): 'Analysis of 15 cities that raised minimum wage found no significant increase in unemployment and a 12% decrease in poverty rates.' Study 2 (FOR): 'Worker productivity increased in businesses after minimum wage increases, offsetting most of the cost.' Study 3 (AGAINST): 'A $15 minimum wage in rural areas would eliminate an estimated 8% of jobs in sectors with thin margins (agriculture, food service).' Study 4 (AGAINST): 'Businesses in border regions where one jurisdiction raised wages saw a 15% shift in new business formation to the lower-wage jurisdiction.' A policy advisor must recommend for or against. How should they weigh these studies? |
| seed-080 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationConfidence Calibration |
Hard |
A fire investigation produces conflicting evidence about the cause. The fire marshal: 'Origin point was the kitchen. Burn patterns indicate an accelerant was used. This fire was intentionally set.' The homeowner's insurance investigator: 'The fire originated in the kitchen, likely from an unattended stove. No evidence of accelerant was found in our lab analysis of debris samples.' The neighbor: 'I smelled something chemical, like gasoline, just before I saw the flames.' Lab report from independent testing of debris: 'Traces of a hydrocarbon compound detected, but consistent with common household products (cooking oil, cleaning supplies) rather than an accelerant like gasoline.' Who is correct about arson vs. accident? |
| seed-081 |
Multi Source Conflict |
DecompositionEvidence WeighingImplication Tracing |
Hard |
A company's customer satisfaction data shows conflicting trends. NPS (Net Promoter Score) survey: increased from 42 to 58 over the past year — a major improvement. Customer support ticket volume: increased 35% over the same period. Social media sentiment analysis: 68% positive (up from 55%). Customer churn rate: increased from 4% to 6% per month. Average customer lifetime value: increased from $1,200 to $1,450. The CEO says 'Our customer satisfaction has never been better.' The VP of Customer Success says 'We're losing customers faster than ever.' Who is right? |
| seed-082 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationConfidence Calibration |
Medium |
An employer receives two contradictory reference checks for a job candidate. Reference 1 (candidate's former direct manager): 'This person was terminated for repeated policy violations. I would not rehire them.' Reference 2 (candidate's former colleague at the same company): 'They left voluntarily for a better opportunity. They were one of the best people on our team.' The candidate's resume says they 'transitioned to pursue new opportunities.' Background check confirms employment dates but not the reason for departure. The HR policy at the former company states: 'We only confirm dates of employment and job title.' How should the hiring team interpret this? |
| seed-083 |
Multi Source Conflict |
Evidence WeighingDecompositionScope Determination |
Hard |
A team is deciding which programming language to use for a new project. Developer A (10 years in Python): 'Python is the obvious choice — rapid development, massive library ecosystem, and our team knows it best.' Developer B (5 years in Rust): 'Rust gives us memory safety and performance we'll need when we scale. Python will become a bottleneck.' Developer C (architect): 'Go offers the best balance — better performance than Python, easier than Rust, good concurrency support.' The project requires processing 100,000 API requests per second at peak. Current prototype in Python handles 5,000 requests per second on the same hardware. The team of 6 developers includes 4 Python experts, 1 Go developer, and 1 Rust developer. The project deadline is 4 months away. Which recommendation should the team follow? |
| seed-084 |
Multi Source Conflict |
Premise IdentificationEvidence WeighingConfidence Calibration |
Medium |
Three news outlets report different casualty figures for the same natural disaster on the same day. Outlet A (international wire service): 'At least 200 confirmed dead.' Outlet B (local newspaper): 'Officials estimate 450 dead, with hundreds still missing.' Outlet C (government press release): '147 confirmed casualties.' A relief organization needs to estimate the scale of its response. Which number should they use for planning? |
| seed-085 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationTemporal Sequencing |
Medium |
A landlord reviews a rental application with conflicting signals. Credit score: 720 (good). Income: $5,000/month (rent is $1,500 — meets the 3× requirement). Rental history: current landlord says 'Great tenant, always pays on time, would gladly rent to again.' Prior landlord (from 2 years ago): 'Left the apartment in terrible condition — $3,000 in damage beyond normal wear. Would not rent to again.' Employment verification: confirmed, stable job for 3 years. Social media: multiple posts about loud parties and late nights. Should the landlord approve the application? |
| seed-086 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationScope Determination |
Hard |
A city's transportation department has two conflicting traffic studies about a proposed roundabout. Study 1 (traffic engineering firm hired by the department): 'The roundabout will reduce average intersection delay by 35% and accidents by 40%, based on modeling of current traffic patterns.' Study 2 (community-funded counter-study): 'The roundabout will increase delays during peak hours by 20% due to the intersection's unusual five-road configuration, and will be dangerous for the high volume of pedestrians and cyclists.' Both studies use standard traffic modeling software. The intersection currently averages 2 accidents per month and has a 4-minute average delay during peak hours. Comparable five-road roundabouts in other cities show mixed results. What should the transportation department conclude? |
| seed-087 |
Multi Source Conflict |
Contradiction DetectionEvidence WeighingConfidence Calibration |
Medium |
A student's college application contains inconsistencies. The personal essay describes overcoming poverty and working two jobs to support the family. The financial aid application reports household income of $125,000. The guidance counselor's recommendation letter mentions the student's 'comfortable home life' and 'family vacations.' The student's extracurricular record includes unpaid internships and volunteer work at a homeless shelter. An admissions officer is reviewing the file. How should they interpret the discrepancy between the essay's poverty narrative and the financial data? |
| seed-088 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationScope Determination |
Medium |
A dietary supplement company claims its product 'boosts immune function by 300%.' They cite: (1) A company-funded study of 30 participants showing increased white blood cell counts after 4 weeks of supplementation. (2) Three customer testimonials reporting fewer colds. An independent review by a university research team found: (3) No statistically significant difference in illness rates between supplement users and a control group over 6 months (500 participants). (4) White blood cell count increases were within normal daily variation and not clinically meaningful. The company accuses the university of bias because they receive funding from pharmaceutical companies that compete with supplements. How credible is the 300% claim? |
| seed-089 |
Multi Source Conflict |
Evidence WeighingDecompositionPremise Identification |
Hard |
An insurance adjuster is assessing damage to a warehouse after a storm. The building owner claims $800,000 in damage. The adjuster's initial estimate is $320,000. A third-party engineering firm hired by the insurance company assesses $450,000. The owner hires their own engineering firm, which assesses $720,000. Building maintenance records show multiple deferred repairs over the past 5 years. Photos taken 3 months before the storm show pre-existing water stains on the ceiling and cracked foundation that the owner's engineering report attributes to the storm. What is the fair damage assessment? |
| seed-090 |
Multi Source Conflict |
Evidence WeighingDecompositionConfidence Calibration |
Medium |
A company's sales team and operations team present contradictory projections for next quarter. Sales: 'Pipeline is the strongest it's been — $4.2M in weighted pipeline. We'll close $3M+ easily.' Operations: 'Current capacity can deliver $2.1M worth of orders next quarter. If sales closes more than that, we'll have fulfillment delays of 6–8 weeks.' The CFO notes: historical close rate on weighted pipeline is 55% (not the 71% sales is implicitly projecting). Last quarter, the company closed $2.4M against a $3.8M pipeline (63%). Sales hires made last quarter are expected to begin producing this quarter. What should the executive team plan for? |
| seed-091 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationConfidence Calibration |
Hard |
An employer investigates a workplace harassment complaint. The complainant provides a detailed written account with dates, times, and specific quotes attributed to the accused. The accused denies everything and says the complainant has a grudge because of a negative performance review the accused gave them last month. Two witnesses are interviewed. Witness 1: 'I heard the accused say something inappropriate once, maybe two months ago, but I can't remember the exact words.' Witness 2: 'I sit near both of them and never heard anything inappropriate.' HR also finds that the complainant filed a similar complaint at a previous employer, which was determined to be unsubstantiated. How should the investigation conclude? |
| seed-092 |
Multi Source Conflict |
Premise IdentificationContradiction DetectionEvidence Weighing |
Easy |
A patient's blood test results from two different labs disagree. Lab A (hospital lab, drawn at 8 AM): Fasting glucose = 128 mg/dL (above normal, suggesting pre-diabetes or diabetes). Lab B (independent lab, drawn at 2 PM same day): Fasting glucose = 95 mg/dL (normal). The patient says they fasted for 12 hours before the morning draw but had lunch before the afternoon draw. Both labs are accredited and use standard equipment. The doctor needs to determine the patient's actual glucose status. Which result is reliable? |
| seed-093 |
Multi Source Conflict |
Evidence WeighingConfidence CalibrationDecomposition |
Hard |
A city receives three proposals for a new public transit line. Proposal A (Bus Rapid Transit): Cost $45M, 18-month build, serves 15,000 riders/day projected, can be expanded easily. Proposal B (Light Rail): Cost $280M, 4-year build, serves 35,000 riders/day projected, fixed route. Proposal C (Autonomous Shuttle Network): Cost $90M, 2-year build, serves 20,000 riders/day projected, flexible routing. An independent study found that Proposal A's ridership projection is based on comparable BRT systems in similar-sized cities. Proposal B's projection uses a model that has historically overestimated light rail ridership by 30%. Proposal C's technology has not been deployed at this scale, and the company providing the estimate has only completed one smaller pilot. Which proposal should the city choose? |
| seed-094 |
Multi Source Conflict |
Evidence WeighingScope DeterminationPremise Identification |
Medium |
A parent receives conflicting advice about their child's learning difficulties. The school psychologist (tested the child at school): 'Standard cognitive and achievement testing shows your child performing within normal range. No learning disability identified.' A private educational psychologist (tested the child independently): 'Comprehensive neuropsychological testing reveals a specific processing speed deficit consistent with a learning disability. Recommend accommodations including extended test time.' The child's teacher: 'They're a bright kid but consistently runs out of time on tests and produces less written work than classmates.' The school says no accommodations are warranted. The parents want accommodations. Which assessment should be trusted? |
| seed-095 |
Multi Source Conflict |
Evidence WeighingTemporal SequencingConfidence Calibration |
Hard |
A museum considers purchasing a historical artifact. The seller provides documentation showing the artifact was legally exported from its country of origin in 1965. A cultural heritage organization provides evidence that the country of origin passed a law in 1970 prohibiting export of such artifacts and has requested the return of all items exported 'without proper authorization.' An independent provenance researcher finds that the 1965 export document was issued by a regional official who was later convicted of corruption, though not specifically for artifact trafficking. The artifact itself has been carbon-dated to the correct historical period. What should the museum decide? |
| seed-096 |
Multi Source Conflict |
Evidence WeighingDecompositionAbsence Reasoning |
Medium |
A factory's quality data from three shifts tells different stories. Morning shift: 2.1% defect rate (8 AM–4 PM, experienced crew, senior supervisor). Afternoon shift: 2.3% defect rate (4 PM–midnight, mix of experienced and new workers, mid-level supervisor). Night shift: 5.8% defect rate (midnight–8 AM, mostly newer workers, junior supervisor). The night shift supervisor says: 'Our raw materials are lower quality — they send us the bottom of the barrel.' The factory manager says: 'All three shifts receive the same materials from the same suppliers.' Purchasing records confirm identical material sourcing for all shifts. What explains the night shift's higher defect rate? |
| seed-097 |
Multi Source Conflict |
Evidence WeighingPremise IdentificationScope Determination |
Medium |
A small town's water supply monitoring shows different results from two sources. The town's own monthly testing (conducted by a part-time employee using basic test kits): 'All parameters within safe limits for the past 12 months.' State environmental agency quarterly testing (conducted by certified technicians using laboratory analysis): 'Elevated lead levels at 18 ppb in last quarter's sample' (EPA action level is 15 ppb). The town's mayor says the state test must be an error because the town's own tests have been clean. The state agency says one exceedance triggers a mandatory notification and remediation process. Who is right about the water safety? |
| seed-098 |
Multi Source Conflict |
Contradiction DetectionPremise IdentificationEvidence Weighing |
Hard |
A venture capital firm is evaluating a startup and receives conflicting due diligence signals. The startup's pitch deck: 'Revenue growing 20% month-over-month for the past 8 months. 500 paying customers.' The startup's bank statements: show revenue growing from $12K to $62K over 8 months, confirming growth. Customer interviews (VC contacted 10 customers independently): 8 of 10 say they are on free trials or heavily discounted pilots, not full-price paying customers. The startup's CRM data: shows 500 accounts, but only 47 with positive monthly recurring revenue. The founder explains: 'We count anyone who has ever transacted with us as a paying customer, including one-time purchases and trial conversions.' Should the VC invest? |
| seed-099 |
Multi Source Conflict |
Evidence WeighingConfidence CalibrationScope Determination |
Hard |
A climate adaptation planning committee has two competing risk assessments for a coastal city. Assessment A (federal agency model, 2024): 'Sea level rise of 0.5–1.2 meters by 2100. Low-lying areas face moderate flood risk. Standard infrastructure upgrades recommended.' Assessment B (university research group, 2025): 'Accounting for ice sheet dynamics not included in federal models, sea level rise of 1.0–2.5 meters by 2100. Low-lying areas face severe flood risk. Managed retreat from coastal zones recommended.' The city's infrastructure has a 50-year design life. A real estate developer opposes Assessment B, citing its economic impact on coastal property values. An environmental advocacy group supports Assessment B. The committee must decide between moderate upgrades and managed retreat. How should they weigh the assessments? |
| seed-100 |
Multi Source Conflict |
Evidence WeighingScope DeterminationDecomposition |
Medium |
A hiring panel interviews a candidate. After the interview, the three panelists have very different reactions. Panelist A (engineering lead): 'Strong technical skills. Solved the coding problem elegantly. I'd hire immediately.' Panelist B (product manager): 'Couldn't clearly explain trade-offs or communicate reasoning. When I asked about product decisions, they only talked about implementation details.' Panelist C (HR): 'Professional demeanor, but interrupted me twice and talked over Panelist B during the behavioral questions. May have difficulty in collaborative settings.' The candidate's technical assessment score is in the top 10% of all candidates this year. The role requires both deep technical work and cross-functional collaboration with product and design teams. Should the candidate be hired? |