Impressive Feature Flag Reports

Feature flags aren’t exactly news. Martin Fowler first published about “feature toggles” in October of 2010. But a funny thing happened on the way to the forum and a whole niche software industry was born to assist developers, product managers, and the rest of the business with adopting feature flags effectively.

This article is not a sales pitch for a single solution, at least not directly. Instead, the author assumes your feature flag solution keeps a record of evaluation. This evaluation record is almost always used as the backbone for AB testing. It’s worth much more though, if you use it wisely.

Evaluation Records or Impressions

In my world, we call the record of an evaluation an “impression”. At the data level, an impression looks like this:

{
  "environmentId": "194da2f0-f000-11ea-ba75-12f2f63694e5",
  "environmentName": "Prod-Default",
  "key": "dmartin",
  "receptionTimestamp": 1729628033254,
  "sdk": "Javascript",
  "sdkVersion": "10.20.0",
  "split": "multivariant_demo",
  "splitVersionNumber": 1729626965675,
  "timestamp": 1729628027290,
  "trafficTypeId": "194c6a70-3e22-11ea-ba75-12f2f63694e5",
  "trafficTypeName": "user",
  "treatment": "red"
}

If you cross your eyes, it could have been written by a feature flag journalist.

Who was evaluated? The key is usually a userid, like ‘dmartin’

What flag was evaluated? The “split” is the flag name: multivariant_demo.

What treatment did they see? The treatment is red in this case, but often it’s either “on” or “off”.

When were they evaluated? The timestamp is in millis since the epoch, precise and simple.

It’s the beginning of a story, all in a tiny JSON package.

Waving a Data Architecture Wand…

First, imagine that the feature flag platform captures and stores these impressions every time they occur.

Now imagine the platform exports the impressions.

The fantasy gets pretty specific when I say the impressions are written compressed to an AWS S3 bucket.

Finally, the AWS S3 bucket is marshalled into a relational table for use by AWS Athena.

    CREATE EXTERNAL TABLE IF NOT EXISTS   split.impressions (
  key STRING,
  label STRING,
  treatment STRING,
  splitName STRING,
  splitVersion INT,
  environmentId STRING,
  trafficTypeId STRING,
  sdk STRING,
  sdkVersion STRING,
  timestamp INT,
  receptionTimestamp INT
)
STORED AS PARQUET
LOCATION 's3://impressions-for-athena/schema-v1/';

This should look very familiar. We’re marshalling impressions stored as parquet in an S3 bucket called “impressions-for-athena”, which is been written to by the feature flag platform.

If my impressions are a table, so what?

    SELECT DISTINCT splitname,
	treatment,
	COUNT(*) AS record_count,
	from_unixtime(MAX(timestamp / 1000)) as "last evaluated"
FROM split.impressions
WHERE timestamp between timestamp - CAST(1000 * 60 * 60 * 24 AS BIGINT) * 30
	and timestamp
GROUP BY splitname,
	treatment
ORDER BY record_count DESC

Now, Athena show me the count of impressions by flag and treatment over the last thirty days.
impressions summary 30 days

It takes seconds to report on hundreds of flags.

This kind of report tells you which of your flags are most active. Let’s change the SQL.

SELECT 
    splitname,
    treatment,
    COUNT(*) AS treatment_count,
    ROUND((COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY splitname)), 2) AS percentage_of_total
FROM 
    impressions
WHERE 
    from_unixtime(timestamp / 1000) >= date_add('day', -30, current_date)
GROUP BY 
    splitname,
    treatment
HAVING 
    COUNT(*) > 50
ORDER BY 
    splitname, 
    percentage_of_total DESC;

This shows you the % for each treatment of a flag. Was it supposed to be off? Is it 50/50 for an experiment? Are all three treatments in balance for an A/B/C test?

splitname treatment treatment_count percentage_of_total
1 carousel off 67 100.0
2 japan_feature off 4916 92.75
3 japan_feature on 384 7.25
4 multivariant_demo red 53 100.0
5 new_onboarding off 39289 83.17
6 new_onboarding on 7692 16.28
7 new_onboarding control 260 0.55

Impressive Discoveries

Daily traffic trends per split.

SELECT 
    splitname,
    date(from_unixtime(timestamp / 1000)) AS date,
    COUNT(*) AS daily_count
FROM 
    impressions
WHERE 
    from_unixtime(timestamp / 1000) >= date_add('day', -30, current_date)
    and splitname = 'new_onboarding'
GROUP BY 
    splitname,
    date(from_unixtime(timestamp / 1000))
ORDER BY 
    date,
    splitname;

splitname date daily_count
1 new_onboarding 2024-09-24 2082
2 new_onboarding 2024-09-25 525
3 new_onboarding 2024-09-26 7286
4 new_onboarding 2024-09-30 2341
5 new_onboarding 2024-10-01 780
6 new_onboarding 2024-10-02 4163
7 new_onboarding 2024-10-03 3380
8 new_onboarding 2024-10-07 3382
9 new_onboarding 2024-10-10 2860
10 new_onboarding 2024-10-11 6501
11 new_onboarding 2024-10-14 260
12 new_onboarding 2024-10-15 4943
13 new_onboarding 2024-10-16 780
14 new_onboarding 2024-10-18 3640
15 new_onboarding 2024-10-22 4420

SDK and SDK version analysis.

SELECT 
    sdk,
    sdkversion,
    COUNT(*) AS sdk_usage_count
FROM 
    impressions
WHERE 
    from_unixtime(timestamp / 1000) >= date_add('day', -30, current_date)
GROUP BY 
    sdk,
    sdkversion
ORDER BY 
    sdk_usage_count DESC;

sdk sdkversion sdk_usage_count
1 evaluator 2.4.0 47948
2 ruby 8.4.0 4685
3 nodejs 10.27.0 135
4 javascript 10.20.0 102
5 ios 2.15.0 15
6 java 4.4.8 13
7 javascript 10.24.1 9
8 .NET_CORE 7.8.0 4
9 react 1.12.0 3
10 ios 2.25.1 2
11 nodejs 10.28.0 2

Not too shabby considering it’s for just the last thirty days.

How about what features were seen by a specific users in the last thirty days?

SELECT 
    DISTINCT splitname,
    treatment
FROM 
    impressions
WHERE 
    key = 'dmartin'
    AND from_unixtime(timestamp / 1000) >= date_add('day', -30, current_date)
ORDER BY 
    splitname;

splitname treatment
1 buttons off
2 buttons on
3 multivariant_demo green
4 multivariant_demo blue
5 multivariant_demo red
6 new_onboarding off
7 next_step on

Impressive Impressions

This article represents a simple list of the endless possibilities. If you have the knowledge, that’s great. But being able to act intelligently on that knowledge is still greater.

david.martin@harness.io

Written with StackEdit.