USATII MEDIA
12 min read • Written by Vlad Usatii

Tim Wijaya

How we got the founder of Kerja.io 500K+ views in one month with 5 videos

After working with over 100 clients personally and tackling their content game head-first, I'd give myself a pat on the back when I say "congratulations on getting Tim 500K views." This is a long time in the making. Contrary to popular belief, the growth process on social media is leaning towards determinism. It is not stochastic, random, linked to trends, etc., but rather a game that requires an articulate funnel and engagement strategy.

This case study examines how we helped consumer-tech founder Tim Wijaya turn his first serious founder video into a breakout organic result: 200K views and 2,000+ new followers on the first video, with the campaign team later reporting that a later video reached roughly 130K. Those hard performance figures are campaign-reported. Publicly, Tim presents himself as a founder building apps like Relay. Some of his content themes reflect statements like “I’m building a global startup from Indonesia…”, “Many Indonesian tech startups are dead. Here’s why:”, and “I’m flying to Tokyo to launch my app.”

Executive Summary

The outcome did not come from posting more, but from posting smart, and we took it upon ourselves to master 5 key steps:

  1. Hook aggressively.
  2. Deliver with energy.
  3. Compress the script until every sentence earns its place.
  4. Anchor the content in the founder’s real identity.
  5. End on a loop that increases rewatches.

The operating thesis is simple in hindsight and is actually quite tough when picking up a new client:

short-form distribution is earned when the opening claim is strong enough to stop the scroll, the body is compact enough to sustain a niche's attention, and the ending creates either cognitive residue, open questions, bewilderment, or replay value.

For Tim, this worked well as he had the raw material of high-performing founder from the start: a clear identity, ambitious positioning, real-world stakes, and a point of view sharp enough to generate disagreement and contextual gifting. Contextual gifting is a term I coined to describe how the above-average communicator responds to content when there are holes in the delivery of a topic. Contextual gifting also occurs when users want to clear up misinformation, comment about potential AI usage in a video, displace a meme, roast the creator, ratio the creator, or compete with the creator for attention.

1. Tim was a strong candidate for founder-led content.

Tim wasn't an "experiment" and had already built a reputation beforehand by attending Brown and selling Kerja. He was already legible as a founder in public and needed an outlet to continue cashing out on his accomplishments. His personal site describes him as a consumer-tech founder who builds viral apps. His internet footprint also shows a recurring theme: startup ambition, Indonesia-specific market commentary, product-launch moments, and a direct, thesis-first style of speaking.

All of that is important because founder content does not work well when the person on camera is trying to borrow credibility from a script or cheat their way to the top by over-claiming. This niche works best when the content feels like a compressed version of a real operator’s worldview -- people feel like they've earned a golden nugget and want to see where it leads them on their path to making riches.

Tim’s content had the right substrate:

  • he was visibly building,
  • he had a recognizable geographic and market context,
  • he showed heavy dissent and gave hot-take-style opinions,
  • and he could say things that felt lived rather than copied.

2. Most founder videos fail for simple reasons.

The opening sentence is weak.

The creator “warms up” instead of making a claim: “I’ve been thinking a lot lately about startups...”

That sentence doesn't create tension or establish credibility. You just sound like some dude.

The delivery is not performative.

A good idea delivered flatly always loses to a worse idea delivered with great force and sheer will. Short-form is a delivery medium and you should talk for attention. Talk to your camera like you're talking to an 8-year old kid with ADHD and an iPad addiction.

The script is redundant ASF.

Many founder scripts are too damn repetitive. They say the same thing three times with slightly different wording, creating drag.

The task was to engineer a piece of short-form communication that satisfied a stricter condition:

The first term comes from the hook. The second comes from script compression and delivery. The third comes from loop design, curiosity gaps, and replay behavior.

We can define as a union of novelty, coherence, and relation over each sentence, awarding a score from for each sentence. Redundancy, defined as , should stay consistent across the full video. If it dips towards , you're sacrificing millions of potential views (which means you're sacrificing revenue).

Loops are a first-class necessity.

We designed the content plan around a loop-de-loop script architecture: open with a claim, escalate with compact proof, leave one edge intentionally under-explained, and close with language that encourages the viewer to replay. This comes with five rules:

1. Hook, Hook, Hook.

The first sentence has to do real work -- it should state the topic as conflict. A good hook is both catchy and structurally useful. It should at least either:

  • make a hard claim,
  • introduce a tension,
  • imply a surprising conclusion,
  • or invite disagreement strong enough that viewers mentally or verbally argue back.

Tim’s publicly surfaced reel themes already illustrate this. Search results show statement-led openings such as “I’m building a global startup from Indonesia…” and “Many Indonesian tech startups are dead. Here’s why:” rather than soft, diary-style openers. Nobody wants to sneak a peek at your diary.

Weak opening

Weak opening: "I want to talk about what I’ve learned from building." Stronger opening: "Founders, stop making nerdy content. You aren't talking to your audience right -- 5 ways to make good founder content, let's go:"

A strong Tim-style opening would be something like:

"If you wrote your first founder video with AI, you already lost."

The standard here is unambiguousness. The viewer should know, within one sentence, what claim is on trial, and how they lost with their initial assumptions. In some cases, stacking multiple hooks is even better:

"Most founder content is boring. And most people blame the algorithm instead of their script. You should be blaming that LLM!"

That sequence works because each sentence sharpens the conflict rather than restating the same idea.

The user (a) identifies as a founder, then (b) identifies as a student of the algorithm, then (c) tries to identify as an LLM user, but since you've created heightened tension around using it for founder-led work, you've introduced a tension that the user wants you to clear up. This is a similar concept to the following thought experiment:

Say you went into the street and wanted to announce some good news. You stand on a chair from a nearby restaurant and start to shout: "Attention everyone." Everyone darts their eyes at you immediately. "I just.." and then you stop. You don't say anything else. People want to know what happens next. It is just a natural part of being human. We want closure, completion. Don't provide that in a hook, ever, and you've got yourself a viewer.

2. Be entertaining but don't talk like an LLM.

Why do people confuse seriousness with flatness so much? Nobody knows, but I suppose people think authority requires emotional suppression. In practice, on-camera communication needs variance and uniquity:

  • changes in pace,
  • emphasis,
  • tonal peaks,
  • physical emphasis,
  • visual punctuation,
  • and punctual conviction.

Delivery coaching matters. If a founder says something important like he is reading an airport safety card, your audience will scroll on and forget about your video after 5 seconds. To truly capture emotions, you should implicate the user to some extent and speak to them with a tonality that I sometimes refer to as "kid-mode." Speak to your camera like you're speaking to a 5-year-old kid about why the sky is blue.

The practical implication of that: if you say something, say it like it matters. Content also benefits from visual support. Graphics and motion inserts give semantic punctuation to the argument. This is especially useful when the script is dense and the speaker is moving quickly and stringing together multiple concepts in rapid succession. The right visual does one of three things:

  1. clarifies the idea,
  2. intensifies the claim,
  3. or resets attention.

3. Maximally compress the script.

One of the deepest misconceptions in content production is that recording is the hard part. Usually, the hard part is deciding what not to say. Our bias is:

no script is ever perfect on the first pass.

A founder should not dump thoughts into a document and immediately hit record. Letters to the editor:

  • Why is this sentence here?
  • Is this sentence carrying new information?
  • Can two sentences become one?
  • Can one sentence become six stronger words?
  • Is there an analogy that compresses the idea faster?
  • Did we say the same thing twice?

An example:

"A lot of people think founder content is just about posting consistently, but in reality there are lots of variables involved, and one of the biggest ones is making sure that people understand what you are trying to say from the beginning, otherwise they are going to scroll away."

"Your first sentence MATTERS more than HOW MUCH YOU POST!"

The second version is easier to remember, easier to quote, easier to comment on, and easier to build a video around. Compression increases transmissibility. The more compact the message, the easier it is for the audience to retain, repeat, and redistribute the message.

4. Sound unique. So Tim sounded like Tim.

After editing this dude's first video, I literally couldn't think of a single person that sounds like or acts like this guy.

the content had to sound like a founder with Tim’s actual worldview, not like a generic strategist.

Agencies suck at making people sound unique. They over-normalize the client into safe, algorithmically bland language (corpo/LinkedIn-speak) and don't set them apart from the next client.

Tim’s public-facing material already suggested the opposite direction: globally ambitious, founder-first, opinionated, and willing to frame things in bold market terms.

Bad:

"Here are some lessons from my entrepreneurial journey."

Good:

"Building in Indonesia has taught me why its better to build for outside of it."

When identity, thesis, and delivery align, the audience perceives the content as native to the speaker. Oh, and Tim just has a very unique way of getting things across, so that was also helpful.

5. Create a loop.

The loop was one of the most important details. A strong ending should feel like the content has curved back into itself. Brownie points if the user doesn't even notice the video restarted.

A line such as "and that’s why..." can work because it creates a micro-absence. The brain wants closure. A useful internal model is:

where

This is an operator model. Its purpose is to force the creator to think causally: if the hook is strong but the body drags, distribution stalls; if the body is strong but the ending is inert, replay value is lost; if all four terms are healthy, the content is more likely to work.


The video worked for a multitude of reasons, a lot of them subtle nuances that would be too technical and proprietary to share here. But there are several metrics short-form teams should care about notwithstanding:

9.1 Follower conversion rate

If the first video produced 100K views and 1,000+ new followers, then the observed follower conversion rate was at least

So the video was converting attention into audience ownership quickly. A high-view video with low follow conversion can still be useful, but it often indicates that the content was entertaining without creating durable creator affinity. By contrast, a video that drives a percent-level follow conversion is usually signaling that the message, identity, and audience targeting were aligned.

When teams say "viewer-retained," they usually mean:

  • Hook hold rate: how many viewers stay past the first seconds.
  • Average watch time: how long the typical viewer watched.
  • Completion rate: how many viewers reached the end.
  • Rewatch rate: how often viewers played the video more than once.
  • Engagement quality: comments, saves, shares, profile visits, and follows relative to views.

An internal scoring model -- a good management heuristic -- we use: where

It forces the content team to diagnose which part of the video was doing the heavy lifting so we can focus on other aspects as well.

One underrated part of the strategy was not fully exhausting the idea.

This sounds counterintuitive, but works well. If one video perfectly closes the topic, there is less pressure for the next video to exist. If a video leaves one obvious edge unaddressed, the audience often supplies the next prompt for free.

That creates a sequence. For example, a first video might claim:

“Most founders make bad content because they open weak.”

The next video can then answer:

  • what weak means,
  • how to write a stronger first sentence,
  • what patterns work by niche,
  • or why some strong hooks still fail.

The goal is to leave strategic surface area for the next iteration because without that, there is literally no point in being a consistent creator.


🚀 We are looking to collaborate with 10 businesses building cool things in the consumer SaaS space.

If you want similar results, reach out to vlad@usatii.com.