Atlas

Atlas

Do you 👂 the people sing?
twitter
twitter

How to export Moments data and make it permanently stored on the blockchain.

This is actually two questions. The first question is how to export your Moments data, and the second question is how to store this data on the blockchain. First, let's talk about the results. In the end, I successfully exported my Moments data from WeChat version 8.0.32 on iOS and stored it on the Crossbell blockchain: https://xfeed.app/u/wxd6bb23a9

The reason I researched this is because on February 4, 2023, my WeChat account was banned, and I wanted to export and showcase my Moments content. I searched for a lot of information during this process, but most of the solutions I found were outdated. So I thought it would be a good idea to document my exploration and the pitfalls I encountered.

A few disclaimers:

  1. I did not export the comments data because I didn't think it was necessary. The purpose of storing data on the blockchain is to confirm ownership, and it doesn't make much sense to help others store their content on the blockchain. However, if there is a demand for it, it shouldn't be too difficult to implement. I noticed that there is plaintext storage of comments in the MyWC_Message01 table, but I'm not sure if it's complete. If there is a demand, you can continue researching based on this tutorial.
  2. I did not export other people's Moments data, for the same reason as above. However, it's not difficult to speculate where to start if you need to export it.
  3. I did not export WeChat friends/chat records. I guess this is a common demand, but I personally didn't have a need for it, so I didn't research it. However, it's likely that it would involve exporting data from a different table, so it shouldn't be too difficult.
  4. I did not parse the sharing of video posts. Regular link sharing can be parsed, but video posts are too complex to recover the actual video link, and I rarely shared video posts, so I didn't have a need to parse them.

Clear Goal: Export Moments Data#

Assuming the current requirement is to export Moments data, there are actually several different scenarios, each requiring a different method:

  • If you are a WeChat User (not a WeChat developer), the official WeChat API provides a data export interface. You can refer to this blog post. If you also want to showcase the data on the blockchain, you can also refer to this blog post.
  • If you are a WeChat user:
    • If your WeChat account is not banned, you can try searching on Taobao for "WeChat Moments" to find services that can export your Moments as an e-book or similar format (I'm curious how Taobao sellers achieve this, it might be through caching).
    • If your WeChat account is banned like mine, or even if it's not banned but you're interested in how to export data, you can try the method I will focus on next, which is to recover data from the phone's cache. This method is feasible because even though your WeChat account is banned, you can still access your own Moments (fortunately).

Cache Recovery Method#

As the name suggests, this method involves ensuring that WeChat has cached your Moments data locally, then exporting your phone's data, and finally finding the relevant files from the exported data and extracting useful information such as posting time and content to reconstruct the complete Moments data.

1. Local Cache#

Open WeChat, clear the cache (this step is not necessary but can reduce the waiting time for backup and copying), then open your Moments and scroll down to the earliest post. Cache all your Moments locally, and make sure to open each image as well, otherwise only the thumbnails will be cached. To ensure that all caching is successful, you can disconnect from the internet after scrolling through all the pages and confirm if you can still see the Moments. If you can still see them, it means they have been successfully cached.

2. Export Cache Files#

Since I logged into WeChat on an iOS device and I'm not sure if I can still log in on other devices after being banned (I don't want to risk losing access to my iOS device by trying too many times), I can only export the cache through phone backup. Android devices should be able to directly export the cache files, but on iOS, you can only access the app's cache files through a complete phone backup.

I used a tool called iMazing, the free version is sufficient. First, backup your phone data, then find the WeChat Documents folder and export it. The steps are shown in the following image. The free version of iMazing allows 10 exports.
image

In the Documents folder, there is at least one folder named with a hash string, like this:

eb8a6093b56e2f1c27fbf471ee97c7f9

This folder contains the personal data of the WeChat user. If you have logged into multiple WeChat accounts on this phone, there may be multiple folders with hash names. If you're not sure which one you want to export, you can export all of them and check.

Find the wc005_008.db and WCDB_Contact.sqlite files in the ./Documents/{hash}/wc/ and ./Documents/{hash}/DB/ directories, respectively. The former is related to Moments data, and the latter is related to contact data. We only need the latter to extract our own account's profile picture.

(Pitfall: New versions of Mac can no longer backup through iTunes)

3. Parse the Cache#

TL;DR: Download this repo, drag the wc005_008.db and WCDB_Contact.sqlite files to the root directory, modify the "hash" in the main.py file to your own hash, then run "python3 main.py" to export a moments.json file.

Regarding the script, there are a few things worth mentioning:

  1. In the script, I set a parameter called "dl_img". If set to True, it will download all the images locally. Since my WeChat account is banned and I don't know how long the images will be hosted, and I'm not sure what might happen if I frequently access the images externally, I recommend downloading all the images locally while you still can for safety reasons.

  2. For moments that are shared links, I not only parse the shared link itself, but also parse the images/titles/descriptions cached by WeChat for that link. This fully restores how the link was rendered in Moments. I did this because many shared links have already become 404... If we only parse the link, it wouldn't be very meaningful. I think it's necessary to parse all the cached data to at least restore the "cover".

Of course, there is a lot more analysis behind the script in this repo. I will briefly explain it, and you can choose to skip this section based on your interest.

WeChat uses SQLite for caching, and to analyze the wc005_008.db database, you can use this open-source SQLite browser. After a simple analysis, I found that there are many tables starting with "MyWC01_" in the db, and the Moments data for your own account is stored in the "MyWC01_{$hash}" table, where $hash is the hash from the directory mentioned earlier, which should represent some kind of ID for your account. I speculate that other "MyWC01_..." tables represent Moments data for your friends.

In the table that stores your own Moments data, I found two fields that are very important: "Buffer" and "id". If you decode the Buffer field in UTF-8, you can see many plaintext fields, some are image URLs, some are names of friends, and some are previous Moments content. From the Buffer field, we can recover Moments data.

Here, let's deviate a bit from the main topic and talk about "decoding the Buffer field in UTF-8". I didn't find a way to decode binary files and read them directly in this SQLite browser. In the end, what I did was write all the Buffer fields in this table to files, and then use a hex viewer/editor to read and analyze them. However, this wasn't smooth either. I tried many hex viewer/editors, but none of them supported UTF-8 decoding. Finally, I found the best and only software that supported UTF-8 decoding was Synalyze It!, but this software only has a two-week free trial and costs $9.99 after that. I don't know if there is a better way to analyze, I hope to exchange ideas with everyone.

Back to the main topic, let's continue analyzing the Buffer field. It's difficult to fully understand the format of this data, but despite that, we can still identify the content based on certain fixed flags. We can see a typical payload like this:

payload

Different fields have corresponding flags, such as image/content/shared links, etc. These fields are represented in the Buffer field as flags, followed by one or two bytes indicating the length of the message, and then the message itself.

Taking the content as an example, the following image shows the binary files of two Moments content. It's easy to observe that "b'\xba\x01'" is the identifier for the content.
flag

With this basic understanding, we can confirm which fields need to be parsed (the final confirmed fields to be parsed are content, image URLs, shared links, images rendered from shared links, titles rendered from shared links, and descriptions rendered from shared links). Then, we need to identify the flags of the fields we want to parse, and finally find a way to parse the length and offset of the message.

But there's one more missing piece: the posting time. Intuitively, the "id" field in the table is closely related to the time because these numbers increase with the actual time. So I guessed it was some kind of timestamp-based algorithm. I'm very grateful to @kaii for his help. In the end, I basically determined that the conversion algorithm between the actual create_time and id is:

create_time = id / 8388607990

This formula was deduced from the MyWC_Message01 table. Although we don't have the exact posting time of the original content, we can infer the actual posting time from the create time field in this table. The id field in this table should correspond to the id of the original post.

We can simplify it and assume that the relationship between the create time of the first reply and the id is the same as the relationship between the original post and the id, because the create time of the first reply is closest to the actual posting time of the original post. So we can directly take the largest Id/CreateTime in the table as our magic number.

SELECT MAX(ID/CreateTime) FROM MyWC_Message01

Actually, the magic number derived from the comments is slightly smaller. To be more accurate, I sampled a few Moments data and made some adjustments based on the posting time displayed in the WeChat app frontend. In the end, I arrived at the number 8388607990. This formula can basically ensure that the error with the actual posting time is within 1 minute.

In summary, that's the basic idea. Of course, there are many details, and if you're interested, you can refer to the code implementation.

(Pitfall: I initially referred to the code in this repo a lot, but the code in this repo parses the Buffer in plist format, while the actual cache format is unknown, but it's definitely not plist format)

4. Data Display and Permanent Storage on the Blockchain#

Now that the data has been exported, you can do whatever you want with it. I think storing the data on the blockchain is a romantic way of recording, so I chose to back up my Moments on the Crossbell blockchain and showcase it on xFeed.

To implement the blockchain functionality and facilitate debugging to ensure that your data is exported correctly, I also created a simple display page in the repository. The final result is roughly as shown in the image:

image

If the data export is successful, you can click "Store on Blockchain" on the page and follow the steps to store it. However, to interact with the blockchain smoothly, there are some preparations to be made:

  1. Download the Metamask wallet plugin.
  2. Claim gas from the faucet for interaction. If you have a large amount of data, you may need a significant amount of gas. If you need more gas, you can contact me.

Once these two preparations are done, you can click "Store on Blockchain" directly on the page.

Conclusion#

WeChat versions are constantly being updated, and the cache structure is continuously changing, so this content is definitely not universally applicable or covers all scenarios. However, I hope this content provides some reference and inspiration. If you have made other discoveries, feel free to share.

Finally, here are the two repositories mentioned in this article:

In addition to exporting Moments, I also wrote a Tampermonkey script to export QQ Zone posts (yes, because my QQ account was also banned). Exporting QQ Zone posts is much simpler, and although the content has already been exported here, I haven't finished organizing the code yet, but I plan to write a simple tutorial later.

References#

Thanks to those who paved the way:

https://zhuanlan.zhihu.com/p/22474033

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.