Skip to main content

A week of Windows Subsystem for Linux

I first experimented with WSL2 as a daily development environment two years ago. Things were still pretty rough around the edges, especially with JetBrains' IDEs, and I ended up buying a dedicated Linux workstation so I wouldn't have to deal with the pain. 

Unfortunately, the Linux box developed a heat management problem, and simultaneously I found myself needing a beefier GPU than it had for working on multi-vector encoding, so I decided to give WSL2 another try.

Here's some of the highlights and lowlights. TLDR, it's working well enough that I'm probably going to continue using it as my primary development machine going forward.

The Good

  • NVIDIA CUDA drivers just work. I was blown away that I ran conda install cuda -c nvidia and it worked the first try. No farting around with Linux kernel header versions or arcane errors from nvidia-smi. It just worked, including with PyTorch.
  • JetBrains products work a lot better now in remote development mode. I was a good citizen and filed half a dozen bug reports with the rough edges I'm still seeing (the most aggravating is related to the diff window, and profiling Cassandra inside WSL2 still doesn't quite work), but it's usable enough that I can overlook the problems. 
  • Windows Terminal is surprisingly good. It really does make it feel like a seamless single system with very few hiccups.
  • It's nice being able to use Windows tools like SpeechPulse in my development workflow. Yes, I know you can roll your own Whisper transcription, but I'd rather let someone else deal with the corner cases and turn it into a product for me. 
  • The performance of ext4 running inside a virtual disk image on NTFS seems to be... perfectly fine?  I haven't formally benchmarked it, but after throwing a bunch of fairly demanding work at it, including running local Cassandra vector search with 10s of millions of rows, it isn't noticeably slower than native ext4 was. 

Here's Python opening up a chart made with matplotlib from Windows Terminal. No special configuration required. 


The Bad

  • As far as I can tell from web forums, nobody really knows how to mount raw EXT4 volumes to WSL2 on boot. With Claude's help, I wrote a PowerShell script that mounts the volumes successfully, but I haven't been able to get it to run at boot or on login. Everyone seems to think that running it as a Windows scheduled task should work, but I don't think I've seen anyone actually claim success. Maybe I need to sacrifice a goat. 

The Ugly

  • Backing up WSL2 data is a pain. Understandably, none of the usual tools on the Windows side can operate at the file level inside the WSL2 disk image.  My first plan was to rsync my projects from inside WSL2 to a directory on NTFS where BlackBlaze could back it up from there, but rsync is not at its best when dealing with lots of small files like you have with, say, Git checkouts.  So now I'm using restic to manage my own backup snapshots on a local disk instead. 
  • Despite the tricks that Windows uses (mostly successfully) to make the Linux integration seamless, at the end of the day it's still running in a separate VM. In particular, setting up sshd is a bag of pain. It may be easier to get SSH access to the WSL2 VM using TailScale, but I haven't tried this yet. 
  • You can't increase the maximum size of a virtual disk image--you have to start over with a new one. There is a tool to export your data to a tarball and restore from it, but obviously this is time-consuming for anything but a toy volume.

Comments

Popular posts from this blog

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition