Reinforcement Fine-Tuning (RFT)

Scenario 3: Illogical reasoning in complex planning

Trip plans ignore how constraints interact. Tool Call Accuracy and Relevance scores fall significantly below threshold — the agent lists factors but doesn't reason through them.

Your task

RFT requires prompts paired with a grader function: the prompt presents a complex, multi-constraint scenario and the grader defines how to score whether the model reasoned through the constraints correctly. Select the 3 samples that are valid RFT training examples for improving Adventure Works' trip planning logic.

Prompt + Grader
User Plan a 4-day backpacking loop for two people: one experienced, one first-time backpacker. Budget $150 for gear rentals. Trip window: late October in the San Juan Mountains.
Grader Score 1 each: accounts for experience gap (1), addresses October weather/snow risk (1), stays within $150 rental budget (1), adjusts daily distances for mixed fitness (1). Max: 4.
Prompt + Grader
User My group: one runner (high fitness), one returning from injury (low fitness), one child aged 12. We have 3 days and want a memorable mountain experience.
Grader Score 1 each: offers route flexibility for different fitness levels (1), addresses injury risk mitigation (1), includes age-appropriate challenge for the child (1), avoids single-route recommendation for whole group (1). Max: 4.
Prompt + Grader
User I want to summit a 14er. I've done 10-mile day hikes before. Solo trip, $500 gear budget, must fit into one weekend from Denver. Travelling in early June.
Grader Score 1 each: recommends an appropriate beginner 14er for solo travel (1), addresses June snowpack risk (1), plans gear within $500 (1), schedules realistic drive + hiking time for a weekend (1). Max: 4.
Prompt + Grader
User What is the elevation of Longs Peak in Colorado?
Grader Score 1 if the correct elevation is stated (4,346 m / 14,259 ft), 0 otherwise.
Incomplete Example
User Plan a week-long hiking trip for a family with young children in Glacier National Park, staying within a moderate activity level.
Missing No grader function recorded for this example.
Labeled Example
User What waterproof jacket should I buy for Pacific Northwest conditions?
Agent Arc'teryx Beta AR | Material: GORE-TEX Pro | Weight: 485 g | Price: $799.95 | Stock: Denver (2), Boulder (0 — order online) | Complements: Atom LT midlayer
Preference Pair
User Is it safe to hike alone in bear country?
Preferred Solo hiking in bear country is manageable: carry bear spray accessible (not in pack), make noise on blind corners, and check recent ranger reports before heading out.
Rejected Solo hiking in bear country is risky and we strongly advise against it without a group.
Prompt + Grader
User Plan a hiking trip for someone who is fit and experienced and wants a challenge.
Grader Score 1 if the plan looks good and is appropriate, 0 if the plan looks bad.
Prompt + Grader
User Plan a trip for a beginner with a $200 gear budget, targeting winter conditions in the Rockies starting next week.
Grader Score 1 if the response is longer than 100 words, score 0 if shorter than 100 words.
0 of 3 examples selected

Select exactly 3 valid RFT training examples from the dataset above.