Jekyll2024-02-07T10:38:59+00:00https://danielhoshizaki.com//atom.xmlDaniel HoshizakiCreator in a programmer's shellDaniel HoshizakiStable Diffusion for Remote Sensing: Reality Check2024-01-20T00:00:00+00:002024-01-20T00:00:00+00:00https://danielhoshizaki.com//stable%20diffusion/remote%20sensing/deep%20learning/2024/01/20/stable-diffusion<p>Generative techniques like Stable Diffusion have been out for a while and are starting to make their way into the field of remote sensing. I thought I’d give these models a try and see what the hype was all about. From my very brief experiments, I can say two things: one, I’m not entirely sold on the utility of generative models for real world applications; two, these models make interesting and abstract art that resembles remote sensing imagery. Until these models and their supporting ecosystem develop further to allow practical application development, I’ll simply spend my time toying with these models to generate interesting art pieces.</p>
<h2 id="base-knowledge">Base Knowledge</h2>
<p>Interestingly, pretrained Stable Diffusion (SD) models have a built in understanding of satellite imagery. These models are pretrained on LAION-5B, which includes both <a href="https://arxiv.org/pdf/2309.15535.pdf">captioned arial and satellite imagery</a>. I asked baseline SD1.5 and SDXL models to create “satellite images of mountains,” and this is what they produced:</p>
<p align="center">
<img src="/assets/images/diffusion/sd_baseline_mountain.jpg" width="512" height="512" />
<figcaption class="text-center">Baseline SD1.5</figcaption>
</p>
<p align="center" class="text-center">
<img src="/assets/images/diffusion/xl_baseline_mountain.png" width="512" height="512" />
<figcaption class="text-center">Baseline SDXL</figcaption>
</p>
<p>Not too bad for models with no finetuning. I found SD1.5 to be fairly opinionated and reliant on it’s original training images when generating satellite images. SD1.5 tends to apply perspectives and features of normal mountain photography onto images that have the vantage point of a satellite. This makes for fairly unnatural looking images. SDXL, on the other hand, has the advantage of being a bigger model and appears to have a better base knowledge of what a satellite image should look like. XL doesn’t force as much stylization or awkward perspectives as 1.5.</p>
<h2 id="finetuning">Finetuning</h2>
<p>I wanted to see if finetuning these models on a remote sensing dataset would yield anything different or, hopefully, better. I took a similar approach as <a href="https://www.reasonfieldlab.com/post/generative-ai-and-remote-sensing-imagery">this article</a> and used part of the AID dataset to finetune on a single class: satellite images of mountains. I stuck to LoRA finetuning since I only have a plebe graphics card with limited memory. This, however, is where LoRAs shine and allow people with consumer hardware to dive into the world of generative models.</p>
<p>The constrained influence of trained LoRA weights was, intriguingly, different for SD1.5 and SDXL. 1.5 no longer suffered from strange perspectives and was largely able to stick to a top-down view of mountain ranges. Some of the high contrast and bright colors, however, still remain.</p>
<p>The XL model was much more receptive to the finetuning dataset. While the output is not necessarily artistic, it does better represent the distribution of the AID images. In other words, the mountain images generated by SDXL look like they could have come from the AID dataset.</p>
<p align="center" class="text-center">
<img src="/assets/images/diffusion/sd_finetune_mountain.jpg" width="512" height="512" />
<figcaption class="text-center">Finetuned SD1.5</figcaption>
</p>
<p align="center" class="text-center">
<img src="/assets/images/diffusion/xl_finetune_mountain.png" width="512" height="512" />
<figcaption class="text-center">Finetuned SDXL</figcaption>
</p>
<h2 id="thoughts">Thoughts</h2>
<p>I didn’t do a whole lot of hyperparameter tuning. It may very well be the case that the difference I saw in 1.5 and XL was due to differences in model and training hyperparameters. I’m also curious about how fully retraining either model on the complete AID dataset would compare to the examples I’ve posted here. Perhaps a follow up post is warranted.</p>
<p>In the meantime, I’ll wrap up with a thought that stuck with me during this whole process: is there an actual use case for generative models in remote sensing? One thing that is immediately clear to me is that there is still a significant barrier to achieving reliable outputs that would serve a meaningful purpose in production environments. Synthetic data is only as good as the training data used to create the generator model. A LOT of data is needed to get really good results. I would argue that high quality labeled data is better used for training a non-generative model for a specific task or application. Even though the <a href="https://arxiv.org/pdf/2312.03606.pdf">outputs of SD variants can be impressive and realistic</a>, I still don’t see what this generated imagery would be really used for remote sensing applications. What am I going to do or make with more images of roads or stadiums?</p>
<p>I’m confident these models will get better in the future and I may even be proven wrong that more images of roads are, in fact, better for training down-stream models. In the meantime, the effort used to train and customize these models is probably better used on curating high quality datasets for much smaller models aimed at reliably tackling a specific task. Until the generative landscape makes another tectonic shift, I’ll be using these models for what they do best: making art.</p>
<p align="center">
<img src="/assets/images/diffusion/river1.jpg" />
</p>
<p align="center">
<img src="/assets/images/diffusion/sd3.jpg" />
</p>
<p align="center">
<img src="/assets/images/diffusion/sd6.jpg" />
</p>
<p align="center" class="text-center">
<img src="/assets/images/diffusion/sd9.jpg" />
<figcaption class="text-center">The art of synthetic landscapes</figcaption>
</p>
<h2 id="code">Code</h2>
<p>I’ve created a <a href="https://github.com/danielhoshizaki/stable-diffusion-remote-sensing">repository</a> with an example implementation of finetuning SD1.5 with LoRA. I rely heavily on <a href="https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py">Huggingface’s finetuning template</a> and have modified the script to work with my RTX3060. The Dockerfile and Makefile should allow you to quickly get started with finetuning the model if you have the same card and the same CUDA drivers as those defined in the Dockerfile. The only other prerequisite is downloading the AID dataset and creating a custom jsonl file to point to the images and define the training captions.</p>Daniel HoshizakiGenerative techniques like Stable Diffusion have been out for a while and are starting to make their way into the field of remote sensing. I thought I’d give these models a try and see what the hype was all about. From my very brief experiments, I can say two things: one, I’m not entirely sold on the utility of generative models for real world applications; two, these models make interesting and abstract art that resembles remote sensing imagery. Until these models and their supporting ecosystem develop further to allow practical application development, I’ll simply spend my time toying with these models to generate interesting art pieces.Hassle Free, Cloud Free2023-01-09T00:00:00+00:002023-01-09T00:00:00+00:00https://danielhoshizaki.com//remote%20sensing/deep%20learning/2023/01/09/sentinel-composite<p>Given enough data and compute budget, there’s a sure-fire way to remove all cloud cover from Sentinel-2 satellite imagery. By enough data, I mean a whopping 5 years worth of imagery. It’s fairly overkill, to put it mildly, but it works: every time, everywhere.</p>
<h2 id="project-description">Project Description</h2>
<p>I created a process that outputs clear-sky satellite images of ANY location on the globe. The process is fully automated and can return a clear tile given a Setninel-2 cell. It even works for locations that are typically hard to process, like the tropics where there is almost year-round cloud cover. Read on to see examples and find out how I put this project together.</p>
<p align="center">
<img src="/assets/images/portfolio/hokkaido2.png" width="300" />
<img src="/assets/images/portfolio/tokyo.png" width="300" />
<img src="/assets/images/portfolio/aichi.png" width="300" />
</p>
<p align="center">
Composite images of Hokkaido, Tokyo, and Aichi Prefecture, Japan.
</p>
<h2 id="processing-technique">Processing Technique</h2>
<p>The dataset I used for this project was Sentinel-2’s level 2 product. This imagery is already calibrated to surface reflectance and is very close to the “true” colors you would see on the surface of the planet. A big bonus is that the data is completely free to use and hosted for free on AWS too.</p>
<p>Creating cloud free images from Sentinel-2 data is actually very easy. A common technique is to take the 25th percentile value from a stack of satellite image pixels. This <a href="https://medium.com/sentinel-hub/how-to-create-cloudless-mosaics-37910a2b8fa8">article on generating a cloud free image of New Zealand</a> describes how the technique works in detail. It’s possible to generate nearly cloud free imagery almost anywhere using this very, very simple approach.</p>
<p>Almost, anywhere.</p>
<p>I was always bothered with how this technique would fail for regions that typically have a lot of cloud cover: mountainous terrain or areas around the tropics. For most production cases, it’s probably not worth the extra work to deal with cloud cover in this area, but this is a personal project and I wanted to see if I could develop a sure-fire way of removing all cloud cover.</p>
<p>It turns out that a modified version of the 25th percentile technique along with a very heavy handed use of almost all Sentinel-2 data can remove the last vestiges of cloud cover pixels. My process needs on average 100 Sentinel-2 tiles (5 years worth of data during summertime months). The vanilla percentile technique can achieve similar results with around 10 tiles! Needless to say, the process I developed is severe overkill. That said, I can run it in my sleep and it will ALWAYS produce a clear image. The same cannot be said for the vanilla technique.</p>
<p>In addition to a large stack of Sentinel-2 images, my process also requires fairly accurate cloud cover masks. I found that the landcover classification layer included in the Sentinel-2 dataset wasn’t sufficient to mask out all cloud cover, so I trained my own cloud detection model. I hand selected a few tiles that had decent landcover masks and trained a transformer based model available from the awesome <a href="https://huggingface.co/">Huggingface</a> library. After that, I fed the model each Sentinel-2 tile and created corresponding cloud masks.</p>
<p align="center">
<img src="/assets/images/portfolio/japan.png" width="500" />
</p>
<p align="center">
Composite output over Japan.
</p>
<p>The final step to creating cloud free images requires combining the cloud mask with the vanialla 25th percentile technique I mentioned in the beginning. Sounds simple, but this process is insanely slow when using a built in method in Numpy. Thankfully, a talented individual by the name of <a href="https://krstn.eu/np.nanpercentile()-there-has-to-be-a-faster-way/">Kersten Fernerkundung created an excellent algorithm</a> to do the heavy lifting. His work imprecisely approximates Numpy’s nanpercentile function, but brings the run time down from quadratic to linear. This algorithm was critical in allowing me to effectively use the cloud masks to compute the 25th percentile value.</p>
<p>Taken together, the above steps allow me to pick any geographic location and generate a completely cloud free image. The results work great even for areas with persistent cloud cover. I’ve only processed Sentinel-2 cells over Japan, but it wouldn’t be too much extra work to parallelize this process on a bunch of servers on cloud computing platforms like AWS or GCP. If I had a few extra bucks to throw at this project I could have the whole globe processed in a few hours. That’s an MLOps project another day.</p>
<p>For now, you can check out a sample of the final output in the map below. I combined the satellite imagery with one of <a href="https://danielhoshizaki.com/2021/10/07/dem-processing.html">my favorite datasets for Japan: a 10m digital elevation model</a>. The result is a very cool looking basemap that I’m sure would look great with other GIS data overlayed on top. Another interesting visualization project for another day!</p>
<h2 id="interactive-map">Interactive Map</h2>Daniel HoshizakiGiven enough data and compute budget, there’s a sure-fire way to remove all cloud cover from Sentinel-2 satellite imagery. By enough data, I mean a whopping 5 years worth of imagery. It’s fairly overkill, to put it mildly, but it works: every time, everywhere.A Vision Transformer Encoder from Scratch2022-10-15T00:00:00+00:002022-10-15T00:00:00+00:00https://danielhoshizaki.com//2022/10/15/vision-transformers<p>The SegFormer transformer model is a very stable and powerful computer vision model. In this article I take a deep dive into the inner workings of the model’s encoder in an attempt to understand why it performs so well. Read on to find out what makes this vision model so awesome.</p>
<h3 id="why-use-segformer-and-efficient-attention">Why Use SegFormer and Efficient Attention?</h3>
<p>The main problem that the SegFormer model tries to tackle is the enormous compute footprint of self attention and the poor ability of vision transformers like ViT to adjust to variable pixel size images. Both issues are interrelated, but I’ll walk through each one separately to begin with:</p>
<ol>
<li>
<p>The computational complexity of self attention is quadratic. If we have an image that’s 512 x 512, we need 262,144 computations for ONE attention map (usually there are 768 attention maps). An image that’s 1024 x 1024 requires more than one million computations! Full self attention simply does not scale very well for larger image sizes…</p>
</li>
<li>
<p>A traditional approach to dealing with the large compute footprint of self attention is to group pixels and calculate the attention between groups (patches) rather than between individual pixels. ViT uses 16 x 16, non-overlapping patches. The problem with the non-overlapping approach is that a model trained on a certain image size (say, 224 x 224) has a hard time working with larger images during inference and fine tuning. If we think about this intuitively, the amount of “information” in a 16 x 16 patch from a 224 x 224 image is very different from the same sized patch from a 1024 x 1024 image. You would essentially need to train the model from scratch using larger image sizes if your end goal was to work with large images.</p>
</li>
</ol>
<p>Ideally, we would use transfer learning and save ourselves time by fine-tuning a model trained on smaller image sizes to work on larger image sizes. Non-overlapping patches, however, prevent effective transfer of knowledge. To get around this road block, and the problem of large compute requirements of self attention, the authors of the SegFormer propose a simple but effective solution: efficient attention using overlapping patches. The method is straightforward:</p>
<ol>
<li>
<p>Down sample the image to 1/4 the original size. A 512 x 512 image becomes 128 x 128. The self attention computation requirements are much better at this downsampled size.</p>
</li>
<li>
<p>Use a convolution kernel to create patches that overlap information from neighboring pixels. Each pixel in the downsampled image contains some information about its neighbors because of the convolution kernel’s size and stride (more on this later).</p>
</li>
</ol>
<p>The result of these two design choices are a vision transformer that has a lower computation footprint (hence, “efficient” self attention) and an architecture that allows for better adaptation to larger image sizes.</p>
<h3 id="overlap-patch-embeddings">Overlap Patch Embeddings</h3>
<p>A nive feature of the SegFormer architecture is the removal of positional embeddings required in models like ViT. By using a convultion kernel with an overlapping size and stride the network is able to retain and understand the positional information associated with pixel and patch values. Here is how the authors describe the benifits of using overlap patch embeddings over regular patch embeddings:</p>
<blockquote>
<p>“<strong>Removing Positional Embeddings</strong>: The introduction of Convolutional Projections for every Transformer block, combined with the Convolutional Token Embedding, gives us the ability to <em>model local spatial relationships through the network</em>. This built-in property allows dropping the position embedding from the network without hurting performance, as evidenced by our experiments, simplifying design for vision tasks with variable input resolution.”</p>
</blockquote>
<p>I’ve added my own emphasis here on modelling spatial relationships through the network. In essence convolutions allow the network to learn spatial relationships from the inputs. This offers a much more flexible method of handling different resolution inputs.</p>
<p>Here’s the code snippet that performs the patch emmbedding via convolutions.</p>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">nn</span>
<span class="k">class</span> <span class="nc">OverlapPatchEmbeddings</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">kernel</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span class="n">channels</span><span class="p">,</span> <span class="n">hidden_state</span><span class="p">):</span>
<span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">()</span>
<span class="c1"># Set up the convolution
</span> <span class="bp">self</span><span class="p">.</span><span class="n">conv</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">channels</span><span class="p">,</span> <span class="n">hidden_state</span><span class="p">,</span>
<span class="n">kernel_size</span><span class="o">=</span><span class="n">kernel</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="n">kernel</span> <span class="o">//</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># Normalize the output of the convolution via Layer Norm
</span> <span class="c1"># Why LayerNorm instead of BatchNorm? https://stats.stackexchange.com/a/505349
</span> <span class="bp">self</span><span class="p">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">LayerNorm</span><span class="p">(</span><span class="n">hidden_state</span><span class="p">)</span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">token_shape</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="c1"># Given an image input with shapes (batch, channels, height, width)
</span> <span class="c1"># Convert the shape to (batch, channels, height x width)
</span> <span class="c1"># Then rearrange the dimensions to (batch, height x width, channels)
</span> <span class="k">return</span> <span class="n">x</span><span class="p">.</span><span class="n">flatten</span><span class="p">(</span><span class="mi">2</span><span class="p">).</span><span class="n">transpose</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="c1"># embedding token shape ready for transformer block
</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
<span class="c1"># The expected shape of the input is:
</span> <span class="c1"># (batch, channels, height, width)
</span> <span class="c1"># First send the input through the convolution layer
</span> <span class="n">conv_out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">conv</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
<span class="c1"># The output image size is downsampled, so we need to collect the new height and width lengths
</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">new_height</span><span class="p">,</span> <span class="n">new_width</span> <span class="o">=</span> <span class="n">conv_out</span><span class="p">.</span><span class="n">shape</span>
<span class="c1"># Reshape the output to transformer block compatable tokens
</span> <span class="n">token_embeddings</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">token_shape</span><span class="p">(</span><span class="n">conv_out</span><span class="p">)</span>
<span class="c1"># Normalize the tokens
</span> <span class="n">token_embeddings</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">token_embeddings</span><span class="p">)</span>
<span class="c1"># Return the image output reshaped as (batch, tokens, embeddings)
</span> <span class="c1"># Also return the reshaped output image's height and width
</span> <span class="k">return</span> <span class="n">token_embeddings</span><span class="p">,</span> <span class="n">new_height</span><span class="p">,</span> <span class="n">new_width</span>
</code></pre></div></div>
<p>Let’s construct the first patch embedding layer of the vision transformer network and feed an example tensor through the layer.</p>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">emb_block</span> <span class="o">=</span> <span class="n">OverlapPatchEmbeddings</span><span class="p">(</span><span class="n">kernel</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">channels</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">hidden_state</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
<span class="c1"># Feed the example image through the layer
</span><span class="n">imgs</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">)</span> <span class="c1"># random 224x224 images with 3 channels
</span><span class="n">emb</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span> <span class="n">width</span> <span class="o">=</span> <span class="n">emb_block</span><span class="p">(</span><span class="n">imgs</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Token embedding shape:"</span><span class="p">,</span> <span class="n">emb</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Reshaped output image height and width: </span><span class="si">{</span><span class="n">height</span><span class="si">}</span><span class="s">x</span><span class="si">{</span><span class="n">width</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Token embedding shape: torch.Size([1, 3136, 32])
Reshaped output image height and width: 56x56
</code></pre></div></div>
<p>In terms of a convolutional operation, our input image of 224 x 224 was downsized by a factor of 4 to 56 x 56. The channels increased from 3 to 32. Each “pixel” or element in the downsized image can be thought of as an individual patch. Self attention will now be calculated between these patches.</p>
<p>The next step is to reshape the patches so that self attention can be computed between patches. Attention blocks in transformers can’t handle tensors in the shape of an image with channels. Instead, the height and width dimension needs to be flattened into a single token dimension (height x width -> token) and the channel dimension will be called the embedding dimension. As shown in the output above, the new tensor shape is now batch, token, and embedding size.</p>
<p>When I first started learning about transformers, the concept of tokens and embeddings were foreign to me. I’m used to thinking about images as pixels and channels. I learned that a good way to think about tokens is through examples from NLP. Each token is a word like ‘cat’ or ‘dog’. Each embedding of a token is a hidden word or meaning associated with the word token. In the ‘cat’ or ‘dog’ example, these words could be associated with a meaning like ‘furry’, ‘mammal’, or ‘pet’. NLP transformers look at the connections between words by examining how the hidden meanings of different words relate to each other.</p>
<p>Unlike language, images don’t have straightforward sequences of tokens. We can, however, force an image into a sequence by looking at every single pixel and treating it in a similar way to how a word is treated in NLP transformers. A single pixel token is associated with multiple channels (like red, green, and blue for color images). Each channel carries a hidden meaning, or embedding, associated with a pixel token. I’ll admit that the hidden meaning of a pixel channel is a lot less intuitive than the hidden meanings of a word. That said, vision transformers find their own meanings in the connections between pixel channel embeddings. In a nutshell, self attention on images tells a model how pixel tokens relate to all other pixel tokens in the image.</p>
<h3 id="the-efficient-attention-mechanism">The Efficient Attention Mechanism</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">einops</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">class</span> <span class="nc">EfficientAttention</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">emb_size</span><span class="p">,</span> <span class="n">heads</span><span class="p">,</span> <span class="n">sr_ratio</span><span class="p">,</span> <span class="n">dropout</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="nb">super</span><span class="p">().</span><span class="n">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span> <span class="o">=</span> <span class="n">emb_size</span>
<span class="bp">self</span><span class="p">.</span><span class="n">heads</span> <span class="o">=</span> <span class="n">heads</span>
<span class="c1"># Check that the embedding size is a multiple of the number of heads
</span> <span class="k">assert</span> <span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span> <span class="o">%</span> <span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span> <span class="o">==</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"Embedding size is not a multiple of the number of heads"</span>
<span class="c1"># Embeddings are evenly distributed among each attention head
</span> <span class="bp">self</span><span class="p">.</span><span class="n">head_size</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span> <span class="o">/</span> <span class="bp">self</span><span class="p">.</span><span class="n">heads</span>
<span class="c1"># Calculate the output embedding size
</span> <span class="c1"># Generally, this is the same as the input embedding size
</span> <span class="bp">self</span><span class="p">.</span><span class="n">total_emb_size</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">head_size</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">heads</span>
<span class="c1"># Set up the linear layers for query, key, and value
</span> <span class="bp">self</span><span class="p">.</span><span class="n">query</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">total_emb_size</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">key</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">total_emb_size</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">total_emb_size</span><span class="p">)</span>
<span class="c1"># Setup the dropout layer
</span> <span class="bp">self</span><span class="p">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout</span><span class="p">)</span>
<span class="c1"># Sequence reduction
</span> <span class="c1"># Used to downsample the key and value images in order to save on computations
</span> <span class="c1"># This is why we term this type of attention as 'efficient'
</span> <span class="bp">self</span><span class="p">.</span><span class="n">sr_ratio</span> <span class="o">=</span> <span class="n">sr_ratio</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">sr_ratio</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">sr</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">sr_ratio</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">sr_ratio</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">LayerNorm</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">emb_size</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span> <span class="n">width</span><span class="p">):</span>
<span class="c1"># Apply the linear layer to the query
</span> <span class="c1"># The query will not be downsampled by the sequence reduction call
</span> <span class="n">query</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="c1"># Check if sequence reduction is necesary
</span> <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">sr_ration</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="c1"># Apply sequence reduction to key and value in order to reduce the tensor sizes
</span> <span class="c1"># Start by reshaping the token embeddings back to an image tensor
</span> <span class="c1"># (batch, channels, height, width)
</span> <span class="n">x</span> <span class="o">=</span> <span class="n">einops</span><span class="p">.</span><span class="n">rearrange</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="s">"b (h w) c -> b h w c"</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="n">height</span><span class="p">,</span> <span class="n">w</span><span class="o">=</span><span class="n">width</span><span class="p">)</span>
<span class="c1"># Apply the convolution based reduction
</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">sr</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="c1"># Reshape back to a token embeddings tensor
</span> <span class="n">x</span> <span class="o">=</span> <span class="n">einops</span><span class="p">.</span><span class="n">rearrange</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="s">"b c h w -> b (h w) c"</span><span class="p">)</span>
<span class="c1"># Apply the layer normalization
</span> <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">key</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">key</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">value</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="c1"># Reshape query, key, value so that the tokens are evenly split between the heads
</span> <span class="n">query</span> <span class="o">=</span> <span class="n">einops</span><span class="p">.</span><span class="n">rearrange</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="s">"b (h t) e -> b h t e"</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">heads</span><span class="p">)</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">einops</span><span class="p">.</span><span class="n">rearrange</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="s">"b (h t) e -> b h t e"</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">heads</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">einops</span><span class="p">.</span><span class="n">rearrange</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"b (h t) e -> b h t e"</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">heads</span><span class="p">)</span>
<span class="c1"># NOTE
</span> <span class="c1"># torch.matmul is faster than torch.einsum so it's better to use the former in production
</span> <span class="c1"># We will stick to einsum since it is easier to read and understand in tutorials
</span> <span class="c1"># (batch, heads, query, embeddings) dot
</span> <span class="c1"># (batch, heads, key, embeddings) ->
</span> <span class="c1"># (batch, heads, query, key)
</span> <span class="n">attention</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">"bhqe,bhke->bhqk"</span><span class="p">,</span> <span class="p">[</span><span class="n">query</span><span class="p">,</span> <span class="n">key</span><span class="p">])</span>
<span class="c1"># Normalize the attention scores
</span> <span class="n">attention</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">functional</span><span class="p">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">attention</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">head_size</span><span class="p">),</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># Contextualize the embedings with the attention scores
</span> <span class="n">context_emb</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">"bhqk,bhve->bhqe"</span><span class="p">,</span> <span class="p">[</span><span class="n">attention</span><span class="p">,</span> <span class="n">value</span><span class="p">])</span>
<span class="c1"># Rejoin the split heads and return a new embedding token
</span> <span class="n">context_emb</span> <span class="o">=</span> <span class="n">einops</span><span class="p">.</span><span class="n">rearrange</span><span class="p">(</span><span class="n">context_emb</span><span class="p">,</span> <span class="s">"b h t e -> b (h t) e"</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">heads</span><span class="p">)</span>
<span class="k">return</span> <span class="n">context_emb</span>
</code></pre></div></div>Daniel HoshizakiThe SegFormer transformer model is a very stable and powerful computer vision model. In this article I take a deep dive into the inner workings of the model's encoder in an attempt to understand why it performs so well. Read on to find out what makes this vision model so awesome.Processing DEM Data for Japan2021-10-07T00:00:00+00:002021-10-07T00:00:00+00:00https://danielhoshizaki.com//2021/10/07/dem-processing<h1 id="japans-digital-elevation-model">Japan’s Digital Elevation Model</h1>
<p>The <a href="https://fgd.gsi.go.jp/download/menu.php">Geospatial Information Authority of Japan</a> hosts one of my all time favorite datasets: digital elevation model (DEM) of Japan. This is such a cool dataset and there is a really nice visualization technique I’ve found that allows seamless visualization over the entire country of Japan.</p>
<p>The site has 10 meter DEM data for all of Japan and 5 meter data for key watersheds and flood areas around urban centers. My personal favorite is the 10 meter data for creating a nice looking background layer for my maps in QGIS. GSI serves the data in XML format, but with a little bit of processing these files can be converted into great looking hillshade rasters.The following image shows an example of what the converted data looks like over Aomori Prefecture, Japan.</p>
<p align="center">
<img src="/assets/images/dem/dem1.png" />
</p>
<p>The data can be further processed and converted into tile map service. This format is great for visualizing very large datasets with graphical UIs and websites. Here’s an example of what the tiles look like when visualized on QGIS.</p>
<p align="center">
<img src="/assets/images/dem/dem.gif" />
</p>
<p>Pretty cool.</p>
<p>The code for converting the <a href="https://fgd.gsi.go.jp/download/mapGis.php?tab=dem">raw XML files</a> into an awesome tiled map is available from my GitHub repository: <a href="https://github.com/danielhoshizaki/DEM-hillshade">DEM-Hillshade</a>.</p>Daniel HoshizakiJapan’s Digital Elevation Model The Geospatial Information Authority of Japan hosts one of my all time favorite datasets: digital elevation model (DEM) of Japan. This is such a cool dataset and there is a really nice visualization technique I’ve found that allows seamless visualization over the entire country of Japan.Himawari 8 Geostationary Weather Satellite2021-07-28T00:00:00+00:002021-07-28T00:00:00+00:00https://danielhoshizaki.com//2021/07/28/himawari<p>JAXA provides free, limited use access to top of atmosphere reflectance and other derived data from the Himawari 8 geostationay satellite. You can request access to the data <a href="https://www.eorc.jaxa.jp/ptree/registration_top.html">here</a>. I’ve put together a helper script that downloads the data files from JAXA’s FTP server and extracts one of the bands (cloud top height) as a GeoTIFF. The script can be modified to include your own account ID and password along with the target date for which the script will download hourly data from the JAXA server.</p>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="kn">from</span> <span class="nn">ftplib</span> <span class="kn">import</span> <span class="n">FTP</span>
<span class="n">cwd</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s">"local_dir_path"</span><span class="p">)</span>
<span class="n">base</span> <span class="o">=</span> <span class="n">cwd</span> <span class="o">/</span> <span class="s">'data'</span>
<span class="c1"># User authentication to JAXA's FTP server
</span><span class="n">ftp</span> <span class="o">=</span> <span class="n">FTP</span><span class="p">(</span><span class="s">'ftp.ptree.jaxa.jp'</span><span class="p">)</span>
<span class="n">ftp</span><span class="p">.</span><span class="n">login</span><span class="p">(</span><span class="s">'account_id'</span><span class="p">,</span> <span class="s">'account_password'</span><span class="p">)</span>
<span class="n">par_base</span> <span class="o">=</span> <span class="s">'/pub/himawari/L2/CLP/010'</span> <span class="c1"># target location on the server
</span><span class="n">ftp</span><span class="p">.</span><span class="n">cwd</span><span class="p">(</span><span class="n">par_base</span><span class="p">)</span>
<span class="c1"># Chose a date and download the data, and convert to GeoTIFF
</span><span class="n">year</span><span class="p">,</span> <span class="n">month</span><span class="p">,</span> <span class="n">day</span> <span class="o">=</span> <span class="mi">2015</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">4</span>
<span class="k">for</span> <span class="n">hour</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">24</span><span class="p">):</span>
<span class="n">target_hour</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">par_base</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="nb">str</span><span class="p">(</span><span class="n">year</span><span class="p">)</span><span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">month</span><span class="p">).</span><span class="n">zfill</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="nb">str</span><span class="p">(</span><span class="n">day</span><span class="p">).</span><span class="n">zfill</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="nb">str</span><span class="p">(</span><span class="n">hour</span><span class="p">).</span><span class="n">zfill</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="si">}</span><span class="s">"</span>
<span class="n">file_list</span> <span class="o">=</span> <span class="n">ftp</span><span class="p">.</span><span class="n">nlst</span><span class="p">(</span><span class="n">target_hour</span><span class="p">)</span>
<span class="n">file_list</span><span class="p">.</span><span class="n">sort</span><span class="p">()</span>
<span class="k">for</span> <span class="n">file_path</span> <span class="ow">in</span> <span class="n">file_list</span><span class="p">:</span>
<span class="n">file_path</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span>
<span class="n">download_path</span> <span class="o">=</span> <span class="n">base</span> <span class="o">/</span> <span class="n">file_path</span><span class="p">.</span><span class="n">name</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">file_path</span><span class="p">.</span><span class="n">name</span><span class="p">[:</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span>
<span class="n">output_path</span> <span class="o">=</span> <span class="n">base</span> <span class="o">/</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">.tif"</span>
<span class="n">temp_output_path</span> <span class="o">=</span> <span class="n">base</span> <span class="o">/</span> <span class="sa">f</span><span class="s">"tmp_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">.tif"</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># read the data from the FTP server
</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">download_path</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">handle</span><span class="p">:</span>
<span class="n">ftp</span><span class="p">.</span><span class="n">retrbinary</span><span class="p">(</span><span class="s">'RETR %s'</span> <span class="o">%</span> <span class="n">file_path</span><span class="p">,</span> <span class="n">handle</span><span class="p">.</span><span class="n">write</span><span class="p">)</span>
<span class="n">command</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'gdalwarp -overwrite -of GTIFF -t_srs EPSG:4326 NETCDF:"</span><span class="si">{</span><span class="n">download_path</span><span class="p">.</span><span class="n">as_posix</span><span class="p">()</span><span class="si">}</span><span class="s">":CLTH </span><span class="si">{</span><span class="n">temp_output_path</span><span class="p">.</span><span class="n">as_posix</span><span class="p">()</span><span class="si">}</span><span class="s">'</span>
<span class="n">subprocess</span><span class="p">.</span><span class="n">run</span><span class="p">(</span>
<span class="n">command</span><span class="p">,</span>
<span class="n">shell</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">check</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">stdout</span><span class="o">=</span><span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">devnull</span><span class="p">,</span> <span class="s">"w"</span><span class="p">),</span>
<span class="n">stderr</span><span class="o">=</span><span class="n">subprocess</span><span class="p">.</span><span class="n">STDOUT</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">command</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'gdal_translate -co compress=deflate -co predictor=2 </span><span class="si">{</span><span class="n">temp_output_path</span><span class="p">.</span><span class="n">as_posix</span><span class="p">()</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">output_path</span><span class="p">.</span><span class="n">as_posix</span><span class="p">()</span><span class="si">}</span><span class="s">'</span>
<span class="n">subprocess</span><span class="p">.</span><span class="n">run</span><span class="p">(</span>
<span class="n">command</span><span class="p">,</span>
<span class="n">shell</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">check</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">stdout</span><span class="o">=</span><span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">devnull</span><span class="p">,</span> <span class="s">"w"</span><span class="p">),</span>
<span class="n">stderr</span><span class="o">=</span><span class="n">subprocess</span><span class="p">.</span><span class="n">STDOUT</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">os</span><span class="p">.</span><span class="n">remove</span><span class="p">(</span><span class="n">download_path</span><span class="p">)</span>
<span class="n">os</span><span class="p">.</span><span class="n">remove</span><span class="p">(</span><span class="n">temp_output_path</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"completed"</span><span class="p">,</span> <span class="n">output_path</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"failed"</span><span class="p">,</span> <span class="n">output_path</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
</code></pre></div></div>
<p>Once the data has been downloaded and processed into GeoTIFFs, it can be further processed for data visualization. The following script uses <a href="https://www.qgis.org/en/site/">QGIS3</a> to generate PNGs with a timestamp label.</p>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">qgis.core</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">PyQt5.QtSql</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">PyQt5.QtGui</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">PyQt5.QtCore</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="k">def</span> <span class="nf">clean</span><span class="p">():</span>
<span class="k">global</span> <span class="n">count</span>
<span class="n">test_layers</span><span class="p">[</span><span class="n">count</span><span class="p">].</span><span class="n">setItemVisibilityChecked</span><span class="p">(</span><span class="bp">False</span><span class="p">)</span>
<span class="n">iface</span><span class="p">.</span><span class="n">mapCanvas</span><span class="p">().</span><span class="n">refresh</span><span class="p">()</span>
<span class="n">QTimer</span><span class="p">.</span><span class="n">singleShot</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="n">fire</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">fire</span><span class="p">(</span><span class="n">base</span><span class="p">:</span> <span class="n">Path</span><span class="p">)</span> <span class="o">-></span> <span class="bp">None</span><span class="p">:</span>
<span class="k">global</span> <span class="n">count</span>
<span class="n">test_layers</span><span class="p">[</span><span class="n">count</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">setItemVisibilityChecked</span><span class="p">(</span><span class="bp">False</span><span class="p">)</span>
<span class="n">test_layers</span><span class="p">[</span><span class="n">count</span><span class="p">].</span><span class="n">setItemVisibilityChecked</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">test_layers</span><span class="p">[</span><span class="n">count</span><span class="p">].</span><span class="n">name</span><span class="p">()</span>
<span class="n">date</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="s">"CLTH_%Y%m%d_%H%M_compress"</span><span class="p">)</span>
<span class="n">date</span> <span class="o">=</span> <span class="n">date</span> <span class="o">+</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="mi">9</span><span class="p">)</span>
<span class="n">date_name</span> <span class="o">=</span> <span class="n">date</span><span class="p">.</span><span class="n">strftime</span><span class="p">(</span><span class="s">"%Y.%m.%d %H:%M"</span><span class="p">)</span>
<span class="c1"># change the label
</span> <span class="n">label_layer</span> <span class="o">=</span> <span class="n">QgsProject</span><span class="p">.</span><span class="n">instance</span><span class="p">().</span><span class="n">mapLayersByName</span><span class="p">(</span><span class="s">'label'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">label_layer</span><span class="p">.</span><span class="n">startEditing</span><span class="p">()</span>
<span class="k">for</span> <span class="n">feat</span> <span class="ow">in</span> <span class="n">label_layer</span><span class="p">.</span><span class="n">getFeatures</span><span class="p">():</span>
<span class="n">label_layer</span><span class="p">.</span><span class="n">changeAttributeValue</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">date_name</span><span class="p">)</span>
<span class="n">label_layer</span><span class="p">.</span><span class="n">commitChanges</span><span class="p">()</span>
<span class="n">iface</span><span class="p">.</span><span class="n">mapCanvas</span><span class="p">().</span><span class="n">refresh</span><span class="p">()</span>
<span class="n">iface</span><span class="p">.</span><span class="n">mapCanvas</span><span class="p">().</span><span class="n">saveAsImage</span><span class="p">(</span> <span class="n">base</span> <span class="o">/</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">.png'</span> <span class="p">)</span>
<span class="k">if</span> <span class="n">count</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">test_layers</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="n">QTimer</span><span class="p">.</span><span class="n">singleShot</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="n">clean</span><span class="p">)</span> <span class="c1"># Wait a second and prepare next map
</span> <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># Once all data processing has been complete, fire off the PNG creation through QGIS
</span><span class="n">root</span> <span class="o">=</span> <span class="n">QgsProject</span><span class="p">.</span><span class="n">instance</span><span class="p">().</span><span class="n">layerTreeRoot</span><span class="p">()</span>
<span class="n">test_group</span> <span class="o">=</span> <span class="n">root</span><span class="p">.</span><span class="n">findGroup</span><span class="p">(</span><span class="s">'test'</span><span class="p">)</span>
<span class="n">test_layers</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">test_group</span><span class="p">.</span><span class="n">children</span><span class="p">())</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">QTimer</span><span class="p">.</span><span class="n">singleShot</span><span class="p">(</span><span class="mi">2000</span><span class="p">,</span> <span class="n">fire</span><span class="p">)</span>
</code></pre></div></div>
<p></p>
<h1 id="visualizing-himawari-8-weather-data-over-japan">Visualizing Himawari 8 Weather Data Over Japan</h1>
<p>The PNG versions of the data can then be fed through <a href="https://imagemagick.org/index.php">ImageMagick</a> to create a GIF. It should be pretty straightforward to modify the first Python script to download 10 minute interval data instead of 1 hour interval data. I downloaded a bunch of data for a particularly interesting day that had 3 typhoons forming over the Pacific Ocean.</p>
<p align="center">
<img src="/assets/images/himawari/himawari.gif" />
</p>
<p>There are all sorts of interesting applications for this type of data: one obvious one being weather forecasting. <a href="https://cloud.google.com/blog/topics/sustainability/weather-prediction-with-ai">Google</a> has a couple of pretrained models that can handle cloud cover data as inputs and output several hour-ahead forecasts. While Google’s platform uses GOES-16 data, one could easily feed the model Himawari 8 data and create forecasts over Asia. Given the amount of data from JAXA, however, with a little extra work, a hand rolled neural network could be trained to predict the weather. I imagine the results from such a model would make an even more interesting visualization project.</p>
<p style="color:grey">Header photo ©JAXA</p>Daniel HoshizakiJAXA provides free, limited use access to top of atmosphere reflectance and other derived data from the Himawari 8 geostationay satellite. You can request access to the data here. I’ve put together a helper script that downloads the data files from JAXA’s FTP server and extracts one of the bands (cloud top height) as a GeoTIFF. The script can be modified to include your own account ID and password along with the target date for which the script will download hourly data from the JAXA server.