Owncloud, the magic of ZFS snapshots, and the Recovery of Encrypted Files

What is Owncloud?

Owncloud is a suite of client-server software for creating file hosting services and using them, similar to Dropbox, Google Drive, etc; except for a small detail, that is free and open-source. Therefore you can install it in your PC to work as a central server to the rest of your client devices.

Resultado de imagen de owncloud logo

OwnCloud requires a database in which administrative data is stored, while the files are stored in the chosen mount point during the installation. The recommended database engines are MySQL or MariaDB, to handle large datasets or large numbers of users. If this is not your case, you can give a try to SQLite. Just google for instructions in how to setup the ownCloud server in your OS distribution. You should be able to have a running ownCloud distribution in less than 30 minutes.

To protect your privacy ownCloud (v. 8.0.4) allows you to encrypt your files through the “Encryption App”. You will have to login in, enable the app, and log out. The next time you log in ownCloud will start encrypting all your data (and it can take quite a long time if you already have many GB of data) using the password provided. If the user loses his password, he will lose the ability to decrypt the data, unless the administrator generated a recovery key (option that is disabled by default) allowing him to recover your data.

 At this moment ownCloud only encrypts the content of your files, but not the directory tree or the file names, so the administrator of the cloud will be able to see and navigate through the directory structure. Note that encrypting your files imply a 35% increase in size.  You can read about the details of the encryption  later in the post.

You should also think in how to protect your data  before attempting to install ownCloud. For instance, you must think what kind of storage you are going to use (NAS, DAS, SATA or SAS disks), and which kind of redundancy (RAID 1, 5, 6?) to avoid any data lost in case of  hard drive failure. Note that a failure in a mechanical drive is not strange at all, specially in a 24×7 environment along the lifespan of the storage appliance. If you are curious about the rate of failure of hard drives in real life, you can check the statistics compiled by BackBlaze for their 49k hard drives.

For the above reasons I have chosen ZFS as the file system over which I will mount the ownCloud server distribution, that is, ownCloud data directory and the associated MySQL database. As we will see next, ZFS is a modern file system with many virtues.

ZFS as a File System

Introduction

ZFS stands for Zettabyte File System, created by Matthew Ahrens and Jeff Bonwick in 2001 at Sun Microsystems, later acquired by Oracle. ZFS was ported to Linux by Bryan Behlendorf at Lawrence Livermore National Laboratory as Open Source code. ZFS is a combined filesystem and logical volume manager designed to achieve:

• Easy storage administration.
• Redundancy handle by the filesystem.
• No downtimes for system repair.
• Snapshot capabilities.
• On-line data integrity and error handling through:

  • – Transaction-based copy-on-write operations.
  • – End-to-end checksumming (fletcher 2 & 4, SHA-256)

Note that data integrity is the cornerstone of ZFS, not performance. This is the reason to chose ZFS as the file system for ownCloud, since performance is not an issue if we are planning to deliver content over the Internet, while we are very concerned about the integrity of the data and possible downtimes due to drive failure or system repair. Moreover, the almost-zero time and space cost of taking snapshot mades ZFS an awesome file system for regular backups, and thereby to restore ownCloud to any point in time in case of catastrophe.  ZFS has a lot of options and possibilities that I don’t intent to cover in this post, but you should read them carefully somewhere else.

ZFS setup

I installed the official stable distribution of ZFS for CentOS Linux, that can be found in the website “ZFS on Linux” together with instructions for its installation on a variety of platforms. For my system I will be using:

  • I cheap PC with 4 GB of RAM and 64 GB SSD disk.
  • 2x 2 TB SATA disks to be configured as mirror (RAID-1)

To create the ZFS pool, the datasets, and to specify some of the properties you can type:

zpool create -f -o ashift=12 tank mirror sdb sdc
zpool set autoreplace=on tank
zfs set compression=lz4 tank
zfs set logbias=throughput tank
zfs set sync=disabled tank
zfs create -o mountpoint=/var/www tank/owncloud
zpool export tank
zpool import -d /dev/disk/by-id tank 

To create other datasets you can proceed as above. In my case, I have created a total of 3 different datasets for ownCloud files “tank/owncloud”, its database “tank/sql”, and a third one in which I store the files associated with my Linux user, “tank/drive”. You can check the status of the array of disks by issuing the “zpool” command, and those of the datasets by using the command “zfs”. Below is a screenshot of my system at this moment:

ZFS Snapshots

At any moment you can snapshot a dataset using “zfs snapshot tank/dataset@YourTag”. Even better, you can use the auto snapshot service by downloading the scripts available from this website. It will allow you to take hourly, daily, and monthly snapshots with the retention policy that you specify. In my case I use crontab to run the scripts:

The snapshots are stored in a hidden folder “/.zfs” within the dataset mount point. ZFS allows you to navigate through a given snapshot, and move any file out of it to a destination of your convenience. You can also roll back a given dataset to a previous state, or you can clone and promote a snapshot to a new dataset to remove any dependence with the parent dataset.

Note that a snapshot is just a collection of references to the used blocks of the file system, like a pointer, and does not occupy any extra space. However, when your storage is getting full, you will have to eventually destroy some of the snapshots to free the referenced blocks. Indeed, ZFS is a Copy on Write (COW) filesystem, as opposed to a Read-Modify-Write filesystem like ext3, xfs, etc. This allows, among other things, for inexpensive snapshots in time and in space. In addition to that, by hashing every single block (to form a Merkle Tree) ZFS allows to discover data silent corruption and its repair on-the-fly as long as some level of redundancy is used. Because of that you can use cheap SATA disks instead of expensive SAS ones, as long as you periodically scrub your pool (i.e. re-compute the hash of all the blocks to find inconsistencies).

Below is a partial log of the command “zfs list -t snapshot”, showing some of the available daily snapshots for this storage appliance:

NAME                                              USED  REFER
tank/mysql@zfs-auto-snap_daily-2015-10-13-0202    10.4M  192M
tank/mysql@zfs-auto-snap_daily-2015-10-14-0202    19.3M  192M
tank/mysql@zfs-auto-snap_daily-2015-10-15-0202    11.2M  192M
tank/mysql@zfs-auto-snap_daily-2015-10-16-0202    10.5M  192M
tank/mysql@zfs-auto-snap_daily-2015-10-17-0202    9.91M  192M
tank/mysql@zfs-auto-snap_daily-2015-10-18-0202     408K  192M
tank/mysql@zfs-auto-snap_daily-2015-10-19-0202    6.84M  192M
tank/mysql@zfs-auto-snap_daily-2015-10-20-0202     608K  191M

For example, now you can navigate to the folder of any of those snapshots:

cd /var/lib/mysql/.zfs/snapshot/zfs-auto-snap_daily-2015-10-20-0202/…

 

Recovery of Encrypted Files in Owncloud

The next section deals with the problem of recovering individual encrypted files outside ownCloud. This will save us the trouble of creating a new instance of ownCloud to revert a specific backup, or the need of touching the tables of the associated MySQL database to manually incorporate a set of “backup” files. Note that the latter is strongly discouraged.

 In the case of restoring a full backup of ownCloud is straightforward with ZFS, as long as we have stored some snapshots of the file system. Rolling back the datasets “tank/owncloud” and “tank/mysql” to a specific time snapshot will result in a working ownCloud installation identical to the one that we had back then. For instance, I end up with a non-working cloud after upgrading ownCloud from 8.0.4 to 8.1.0, due to the change in the encryption methodology and the migration of private keys. The quickest and safest solution, simple: I just rolled back to a snapshot from an hour earlier and that’s it. According to ownCloud documentation, users with the Encryption App enabled should wait for the version 8.1.1.

A more complex problem is to recover a particular set of encrypted files of a given user from a previous backup. This was the case of a user of this cloud, whose folder got corrupted in his local PC, and then synchronized  with the server corrupting the local copy. Fortunately, I had snapshots of the system, so the question was: How do I decrypt the files of an user from a previous snapshot? 

Manual Decryption Tool

The next tool to decrypt files is written in PHP, and I have adapted it to the particular file structure of ownCloud 8.0.4. Credits go to Björn Schießle and the team at GitHub who modify the original script to achieve a “manual decryption tool” for ownCloud.

As specified in the blog of Bjorn Schiessle, the ownCloud encryption is based on the symmetrical-key AES 256 bits algorithm. Meaning that the same key can be used to encrypt and decrypt a file. A total of 3 types of keys are used:

  • Asymmetric Private/Public User key-pair. (Private Key encrypted with user log-in pass and symmetrical AES-256)
  • File-key associated with each single file. (strong ASCII key)
  • Share-keys, allowing multiple users to access a file (sharing with an user, public links, recovery-key-pair)   

Below is an sketch showing the mechanism used by ownCloud to encrypt (solid arrows) and decrypt (dashed arrows) a specific file:

The steps to decrypt a file are as follow:

  • Locate the library Crypt.php in your installation.
  • Get the User encrypted Private Key (UEPK) and Decrypt it. –> Decrypted U.P.Key (UPK)
  • Decrypt the File Key (DFK) using its “Encrypted File Key” (EFK) together with the User Shared Key (USK) and the UPK:  DFK=multiKeyDecrypt{USK,UPK,EFK}
  • Strip encrypted file (EF) into EF=Header+Ciphertext. (Header=Not encrypted, for authentication. Ciphertext=Encrypted)
  • Decrypt the EF ciphertext block by block (128-bit block) using the DFK and cipher algorithm listed in the header (AES-256-CFB, see sketch below)

The  symmetric cipher AES-256 used in ownCloud falls in the category of block ciphers, which encrypt or decrypt an entire block of 128-bit (16 bytes) of plaintext bits at a time with the same key, in this case of length 256 bit secure. As an anecdote, since 2003 the US National Security Agency (NSA) requires that Top Secret documents  must be encrypted with AES-192 or AES-256. So yes, your ownCloud files are treated as “Top Secret”.  To speed up the encryption and decryption, the AES cipher uses 4 tables look-up, achieving throughputs larger than 1.5 Gbit/sec. Note that this is the write speed of a standard SATA disk.

A cipher feedback (AES-256-CFB) is used by ownCloud as the mode of operation for the block cipher. Below is sketched [adapted from Wikipedia] the decryption step of an encrypted file: 

The required arguments within the PHP script are:

  • Specify the user and his password (yeap, we need his pass!)
  • Choose the folder to be decrypted given the path to the snapshot directory.
  • Choose the folder to store the recovered files

The PHP script to decrypt all the files in a given folder is as follow:

<?php

// Replace these with your custom values
//--------------------------------------------------
$User         = 'your_user';
$UPass        = 'your_pass';
//--------------------------------------------------
$snapshot     = '/var/www/.zfs/snapshot/zfs-auto-snap_daily-2015-09-21-0202/';
$datadir      = $snapshot . 'html/owncloud/data/';
$RelPath      = 'path_2_a_folder_within_my_owncloud';
$pathToFiles  = $datadir . $User . '/files/' . $RelPath ;
$PRecoverFiles= '/home/user/recovered/';
//--------------------------------------------------


require_once '/var/www/html/owncloud/apps/files_encryption/lib/crypt.php';
$Lib='OCA\Files_Encryption\Crypt'; //Path to Crypt.php

// Where are the files & keys?
$p2file = $datadir . $User . '/files/';            //Path to Files
$p2encr = $datadir . $User . '/files_encryption/'; //Path to Encryption
$p2keys = $p2encr . 'keys/';                       //Path to Keys
$pEPK   = $p2encr . $User  . '.privateKey';        //Path to Encrypted Private Key

// First get users private key and decrypt it
$EPK    = file_get_contents($pEPK);
$DPK    = $Lib::decryptPrivateKey($EPK, $UPass); //Decrypted Private Key

$i=1;
$objects = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($pathToFiles), RecursiveIteratorIterator::SELF_FIRST);
foreach($objects as $name => $object){
    $FPath = $RelPath . '/' . str_replace($pathToFiles."/","",$name); //FilePath
    echo "$i -------> $FPath \n";
   
    // Decrypt the file-key: use our private key and the share key
    $pEFK = $p2keys . $FPath . '/fileKey';                //Path to Encrypted File Key
    $pSK  = $p2keys . $FPath . '/' . $User . '.shareKey'; //Path to Shared Key
   
    if(!file_exists($pEFK) || !file_exists($pSK)){
        echo "\tskipped: ".$FPath."\n";
    }
    else {

        $SK  = file_get_contents($pSK );               //Load Shared Key
        $EFK = file_get_contents($pEFK);               //Load Encrpted File Key
        $DKF = $Lib::multiKeyDecrypt($EFK, $SK, $DPK); //Decrypted Key File

        // finally we can use the DKF to decrypt the file
        // but first, strip header block
        $handle   = fopen($p2file . $FPath, 'r');
        $DContent = '';

        $data = fread($handle, $Lib::BLOCKSIZE);
        // If this block contained the header, we can continue
        if ($Lib::isHeader($data)) {
            $header = $Lib::parseHeader($data);
            $cipher = $Lib::getCipher($header);
        } else {
            die('Cannot find header');
        }

        while ($data = fread($handle, $Lib::BLOCKSIZE)) {
            // Decrypt the block
            $DBlock    = $Lib::symmetricDecryptFileContent($data, $DKF, $cipher);
            $DContent .= $DBlock;
        }
        fclose($handle);

        $newDir = dirname($PRecoverFiles.$FPath);  

        if (!is_dir($newDir)) {
          // dir doesn't exist, make it
          mkdir($newDir,0777,true);
        }
        // Write the decrypted file
        file_put_contents($PRecoverFiles.$FPath, $DContent);
        $i++;
    }
}
?>

The result of decrypting a single file using a similar script is as follow:

 

Rendering High Quality images for Fluid Mechanics

 

People have been asking me  about how to achieve the level of quality and detail of some of the pictures and videos that I posted in this blog. The images are completely made with open source software. Briefly, those are the steps that I took to create the pictures/videos:

  1. Post-process (in my case I use Matlab, or Python) the RAW data files, stored in hdf5 format.
  2. The data is visualized then using PARAVIEW, that can easily read files in hdf5 once you specify its grid and data topology in a .xdmf file. This file describes the data for Paraview.
  3. In Paraview you can export what you see to a .x3db object format, that can be imported by the rendering program BLENDER: ”What you see is what you get”.
  4. In Blender you can specify the materials, lights, cameras and their trajectories, etc… and render the scene in any resolution  at the expenses of computational cost (in our webpage http://goo.gl/2pe79R, if you click there is a scene in 4K resolution, 3840×2160 with about 600 ray casting iterations).
  5. For instance, the next video https://goo.gl/9ar3rU was rendered using the new “Cycles” engine of Blender, allowing you to use multiple GPUs to render a given scene. In particular, this video was rendered in Full-HD at 29fps using 2 x Nvidia GeForce GTX TITAN (6GB each). With this configuration, one complex frame takes around 4 min to render, and a video of 30 seconds about a day.


In the next paragraphs I will develop each of the above points. You can download the files used in this tutorial here.

  

Post-processing the RAW data files

In my case I was interested in plotting some velocity turbulent coherent structures falling in the category known as “large-scale-motions” (LSM), that is, turbulent structures with lengths of the O(δ); where  δ is the local thickness of the boundary layer.

The data I will use is publicly available in our website.  You can either use:

  1. Entire velocity fields (84 GB each): http://goo.gl/i6mGOs
  2. Portion of the velocity fields in the streamwise direction (Lx=2δ) at several Reynolds numbers: http://goo.gl/4Joa1b

All the above files are written in HDF5, a very convenient format to read/write (serial or parallel) and manipulate very-large files. I am going to use one of these fields in Physical space (after iFFT in the spanwise direction) which occupies 8 GB and contains about 2.2 GPoints. A simple code in Matlab to convert the Boundary Layer images to physical space is in here.

The next step is to read the file and to classify the structures that satisfy certain constraints. In my case I threshold the structures with some value of their intensity standard deviation, for instance: u’(x,y,z)>2*σ; where u’ is the fluctuating velocity and σ its standard deviation. In that way I am looking to some intense “high-momentum” structure.

You can write your own script to classify and label the above structures, or you can use some function that already does that for you. I did use the Matlab function “regionprops” to extract the different structures and to get their list of pixels and bounding boxes. Because I am not interested in doing volumetric rendering, I assign the integer value of 128 to the points of the structure and zero value to the rest of the points. Some smoothing of the structure can be achieved by using the “smooth3” Matlab function.

Once we have identified all the structures that satisfy that they are a LSM, we pick the structure that we want to plot. The next step is to write the require files to be able to plot that structure using Paraview, a powerful visualization program. The output files are:

  • Raw data for the bounding box of the structure and the corresponding normalized grids: {x,y,z}/δ. The file is “struct_discri_number_3.h5″ and the dataset that we will plot is “field”; although the file contains other data.
  • File descriptor with the data topology for Paraview: .xdmf (eXtensible Data Model and Format). The topology of my data is simple: a cartesian 3D mesh whose indexes are mapped to the grid vectors  x/δ=xd99,  y/δ=yd99, and  z/δ=zd99, respecitvely; all of them in double precision.
<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>
<Xdmf Version="2.0">
  <Domain>
    <Grid Name="Grid_Pixels" GridType="Uniform">
      <Topology TopologyType="3DRectMesh" NumberOfElements="749 252 1036"/>
      <Geometry GeometryType="VXVYVZ">
        <DataItem Dimensions="1036" NumberType="Float" Precision="8" Format="HDF">
struct_discri_number_3.h5:/zd99
        </DataItem>
        <DataItem Dimensions="252" NumberType="Float" Precision="8" Format="HDF">
struct_discri_number_3.h5:/yid99
        </DataItem>
        <DataItem Dimensions="749" NumberType="Float" Precision="8" Format="HDF">
struct_discri_number_3.h5:/xd99
        </DataItem>
      </Geometry>
      <Attribute Name="Discri" AttributeType="Scalar" Center= "Node">
        <DataItem Dimensions="749 252 1036" NumberType="Int" Precision="1" Format="HDF">
struct_discri_number_3.h5:/discri
      </DataItem>
      </Attribute>
      <Attribute Name="UReal" AttributeType="Scalar" Center= "Node">
        <DataItem Dimensions="749 252 1036" NumberType="Int" Precision="1" Format="HDF">
struct_discri_number_3.h5:/ufield
      </DataItem>
      </Attribute>
    </Grid>
  </Domain>
</Xdmf>

I wrote time ago a Matlab function to automatically dump this file. The arguments are the name for the xdmf file, the name of the file containing the data, and the size of the data [n1,n2,n3]. You can downloaded here. Feel free to modify it according to your needs.

Visualizing data with Paraview

ParaView is an open-source, multi-platform data analysis and visualization application. You can use it in client mode, or in server-client mode. When using the server-client mode you can visualize the data in your client by using a server or a cluster as backend. For the next plots a laptop with couple of GB of RAM suffices, but to visualize and post process files larger than 10 GB you will have to move to a server or a cluster, preferably with GPUs on it. A video created in our department by Lozano-Duran with Paraview (using 4 nodes, each with 192 GB and a one GPU)  can be found in here. This video was presented at one of the APS conferences couple of years ago.

The graphical interface of Paraview is very intuitive. You can find some good tutorials to start with this program at TACC.

The first thing you have to do is to import the data into Paraview. For that we go to File–>Open–> and choose the xdmf file.

Once the file is open we need to select the dataset that we want to work with, and confirm by pressing the “apply” button. In the screenshot below I am showing the result of performing couple of operations with the raw data:

  • Plot an isosurface for some given value.
  • Use the calculator to map the height to a function and to color the isosurface using this map. (this is not necessary for exporting the data to Blender)

 

Although we will go one step further prettifying the plot with Blender, note that very nice figures can be obtained with Paraview by playing a little bit more with the data. One of this pictures that I consider pretty is the one below, portraying a 3D correlations of the streamwise velocity and that appear in my paper of Physics of Fluids:

 

The last step before going into Blender is to export the data from Paraview to a suitable format that can be directly read by the program. One of these formats is the X3D (Extensible 3D Graphics), that is a royalty-free ISO standard supporting  multi-stage and multi-texture rendering, shading with light map, etc.

The file just created weights about 250 MB:

 Visualizing data with Blender

 Blender is a professional free and open-source 3D computer graphics software. At first glance it can seem a very complicated program, which is true if you want to master it. Fortunately, if you want to do some basic image rendering is not that hard, and the learning curve is not very steep. The reward for the time invested in learning the program is paid by far in just a week.

There are very good tutorial to get you started with it, and I would recommend to start with the Youtube channel of Blender Guru (Andrew Price) to learn about the basics of Blender: lights, cameras, textures, rendering, objects, shadings, camera trajectories, etc.

The last versions of Blender come with a new Render Engine call Cycles. Cycles is a ray tracing rendered focused on interactivity and ease of use. One of the most important changes respect other engines is its capability to use your GPU (or GPUs) to render, meaning that you can render much faster a scene, and that you can even work with on-the-fly mode rendering.

My renders were primarily done in a PC with 32 GB RAM memory and 2 x Nvidia GeForce GTX TITAN (6GB GDDR5 each) . Total cost about $3000. My laptop can do the rendering as well, providing that it has 16 GB RAM and a Intel i7 processor with 4 cores. However, it will take more than x10 times to render the same image (around an hour instead of 4 minutes).

The first thing we must do is to import the x3d file that we created with Paraview. To do that we open the File menu and choose Import –> X3D. Note that Blender allow us to import up to 9 different formats.

Depending of your machine, it will take to load the 250 MB of data around a minute or so. Once the data is loaded the structure will appear in Blender as follow:

Now is time to start playing with the cameras, textures, planes, position, background, text, etc… In my case I want to achieve the next goals:

  • Nice texture for the object: preferably mate and with high contrast to see the details of the different parts and scales conforming the object.
  • A proper illumination so the image doesn’t look too dark, neither to bright.
  • Some physical references for the reader to imagine the dimension of the structure and how close it is from the ground floor. I decided to use 3 different planes acting as walls.
  • A way to quickly measure the structure in any of the 3 directions. The walls present a grid of a given size (Δ=δ/4).
  • Some text indicating the flow direction.
  • Render the image in Full HD with 600 Ray Casting Iterations (very very clear image).

All the operations that we have performed in Blender appear chronologically in the right panel. The viewport mode (how the user see the scene) is set to texture, that takes no time to render the data.

If you want a fast way of previewing the render you can either go to Viewport and select render, or you can hit shift+z. If you are using a CPU that can be quite slow.

With a high-class GPU card you can somehow work in this mode, although it will be constantly rendering as you interact with your scene. You will notice that once the render is done the image looks a little bit sandy. The reason is that in “rendered” mode only few Ray Casting Iterations (50) are performed to  present the user with an image of OK quality in the minimum amount of time. We will see later how to specify the properties for the final rendering of the scene and also for this “rendered” mode.

To define the texture of the object, as it was displayed above, we can use the menu located at the right and click in “Material”. You can preview the result of assigning a given material, or after performing an operation, for a plane, a cube, a sphere, or a 3D object using as a model the the famous “monkey” of Blender: Suzanne.

For the Surface we are going to use a “Mix Shader”, and we will play with different shaders (“Ambient Occlusion”, “Glossy BSDF”), colors, and roughness. You can find here an explanation for the different shaders as it appears in the Blender Manual. To get the most of Blender you can use what is call the “Node Editor”, that allow you to perform and concatenate operations as if each filter or texture is a “black box” with its associated transfer function. In that way you can create your own materials  and much more, plus once you get use to this way of work your productivity will, for sure, foster. Here is a lovely tutorial for textures using the node editor.

 

The next operation is to define the texture for the walls and to make the rectangular tiles of a given size. The way to do that is to use the Node Editor and some boolean operations. For the texture of the wall I am using again a Mix Shader, and for the tiles over the walls I am subdividing the plane in rectangles with edge length of 0.25δ. Below is a screenshot in “Edit Mode” showing the different 0.25×0.25 rectangles along the bottom wall.

Using the “Geometry” property along the plane (Parametric) and separating the Red channel (“Separate RGB”) we can  create the black lines for the tiles, whose thickness is specified using the  ”Greater than” function with a given value. The last step is to render those lines: we weigh the two different shaders through the input factor “fac” in “Mix Shader” node, whose value is the result of combined the “Geometry+Separate RGB+Greater than” nodes.

The result of using different values for the “Greater Than” node for 0.02 until 0.2 with increments of 0.06 is as below. For our lines we have use the value of 0.02.

Now that we have completed all the operations, we will render the scene and we will make sure that we are ok with the camera view, lights, shading, texture, etc… before rendering the final image. In the render menu we have a variety of options for the “Dimensions”, the “Performance”, the “Sampling” and the “Output”, etc. For instance, if we want to take a quick render of a Full HD image (1920×1080 px) we can specify a percentage of the total resolution, for example 20%. The image will render 25 times faster. In performance we can save the buffers, so the next time we render some of the calculations will be already computed speeding up the time to render. If we use GPUs we can play with the size for the “Tiles” parameter, that is, the portion of a 2D scene in which a thread work on. Since GPUs have a lot of cores we want that the work load is properly balanced. We can therefore increase the granularity of the problem by using smaller tiles, so no GPUs’ core will be idle when rendering. The same applies for multicore architectures: the Cycle engine threads the rendering work load over the cores, that by default is automatically detected by inspecting the hardware in which Blender is running on. Using my laptop that mounts an Intel Core i7 2.2 Gh processor, Blender detects 8 Threads, corresponding to 4 physical cores with hyper threading enabled.

 

Another interesting parameter is related with the Ray Casting sampling frequency. Rays are traced from the camera into the scene, bouncing around until they find a light source such as a lamp, an object emitting light, or the world background. To have a really good render is advisable to set this parameter over 300 hundreds samples. As more samples are taken, the solution becomes less noisy and more accurate. In our case, and because we are not going to make a movie, we set this parameter to 600 samples. As you can see, the “Preview” is set to 50 for the on-the-fly “Rendered” mode.

We proceed to render the final image using the Full HD resolution and 600 Ray Casting iterations. Each thread work in a tile, with a total of 40 tiles. The summary of the memory being used, the number of object to process, etc.. is presented at the top. A capture of the terminal running htop shows that all the cores are basically working at 100% of their computing power.

In the Render menu also appears the options to save the Output. I have saved the rendered image as PNG with a compression of 90%. The final result is presented below:

The same structure rendered using other texture and removing the walls at 4K resolution (6 MB):

 Here is an animated video in Full HD:

 

 

 

Final project (future Master or Ph.D. thesis) at Madrid CFD lab

Curriculum Vitae Juan Sillero

Single page CV

 

Extended CV

Do you want to play with a “serious” amount of turbulent data? It is now at the tip of your fingers!

In our research group we not only generate high-quality DNS data for turbulent flows, but do we love to share it with the scientific community to advance this complex field of engineering.

When it comes to treat turbulence, a colleague from the group  (Alberto) likes to imagine himself as the protagonist of the painting of C.D. Friedrich: “Wanderer above the sea of fog” I cannot deny that I have felt in that way -in the best case scenario- when treating with turbulence; if not drowning.

For everyone to access our simulations I have created a “simple” but practical database to download the latest raw data of our group channel and boundary layer simulations. It includes characteristic flow variables (velocities, pressure, vorticity, etc.) for several Reynolds numbers and geometries, and even time resolved data for some channel flows as generated by Adrian and discussed in his JFM papers about “Time-resolved evolution of coherent structures …” [link to the papers]

How our data looks like?

To give you an idea of the kind of data that is hosted, I have plotted below a histogram of the different file sizes that we are currently using for our daily research in turbulent flows. The vertical dashed lines correspond to 1 kB, 1 MB, 1 GB, and 100 GB respectively. For instance, there are more than 100k files of 1 GB, corresponding to some of the time series for channel flows.

 

Public Distribution Of Our Databases

The publicly available data is hosted in one of our storage appliances (that we daily used it to post-process the simulations), linked thorough our website at torroja.dmt.upm.es/turbdata/index and offering more than 100 TB of available data. The server is connected to the Internet through a 1 Gbps bandwidth connection, making possible the transfer of tens of terabytes in just couple of days.


Within the database folder the simulations are organized as belonging to channel or boundary layer flows. In the short future, data for homogeneous shear flows and isotropic turbulence (time-resolved) should be available as well. In the mean time, the available simulations are:

Since we brought the site online more than 800 visits have been received from more than 80 universities.

Curious about the setup?

If you are curious about how it is setup this database within our infrastructure, the diagram shown below should suffice. Different storage appliances are shared through NFS to the rest of the post-processing servers (with quite a lot of RAM memory and CPU power to digest the data) and to a number of HPC clusters; all of them connected to a fast Infiniband QDR network. The server that acts as NFS server for the rest of machines is directly connected to the storage, as a NAS (Network Attached Storage), giving us good enough IO bandwidths (>2.5 GB/sec) considering the relatively low-price of the chosen storage appliance.

Our storage appliance is a JBOD (Just a Bunch of Disks) with 60 hard-drives of 4 TiB of capacity and two-zones of 2×30 disks controlled by two SAS cards (12 gbps) providing with a total capacity of 240 TiB (216 TB). The data is protected using something similar to a RAID 60: 6 groups of 10 disks with 2 disks for distributed parity. The design of the overall system is pretty robust, with a really good performance at a cost under 10 cents of euro per TB.

Interested in buying storage?

Please feel free to contact me at sillero@torroja.dmt.upm.es to get a price quotation.

 

Ph.D. Dissertation

Slides Ph.D. presentation

You can find here the slides of my Ph.D thesis defense (July 21st, 2014): “High Reynolds numbers turbulent boundary layers”

 THESIS ANNOUNCEMENT

 THESIS SLIDES

 

Low and High Momentum velocity structures

Dimensions: 16δx7δx1.5δ. Flow from lower-left to upper-right.

(Structures shorter than 1.5δ are removed)

Our code “OpenTBL” makes it to the High-Q club


Highest Scaling Codes on JUQUEEN

“Following up on our JUQUEEN porting and scaling workshop and to promote the idea of exascale capability computing, we have established a showcase for codes that can utilise the entire 28-rack BlueGene/Q system at JSC. We want to encourage other developers to invest in tuning and scaling their codes and show that they are capable of using all 458,752 cores, aiming at more than 1 million concurrent threads on JUQUEEN.
The diverse membership of the High-Q Club shows that it is possible to scale to the complete JUQUEEN using a variety of programming languages and parallelisation models, demonstrating individual approaches to reach that goal. High-Q status marks an important milestone in application development towards future HPC systems that envisage even higher core counts”

LINK TO THE WEBSITE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

u’ and vorticity movie

Another rendered image of u’

Structures of high momentum u’=2+.

Size: (Lx,Lz,Ly)=(2.42,0.90,0.89)*d99

At the wall (y=15+) are presented the low- and high-momentum streaks, colored from black to white.

Official TBL Paper; published on Physics of Fluids

It just came out, the paper discussing the data of one of the highest Reynolds number simulations of zero-pressure-gradient turbulent boundary layers, which I have been running in the last few years at Argonne National Laboratory.

The data is compared with turbulent inner flows at similar Reynolds numbers, such as channels and pipes.

You can download a copy of the manuscript here.

Data of this simulation is publicly available in our website. If you need something else that is not listed, please let me know, and I will be glad to add it.

Thanks to everyone who made this possible.

 

 

14th European Turbulence Conference

You can find here the slides of my talk:

“Effects of hot-wire measurement in wall-bounded flows studied via Direct Numerical Simulation”. 14th Turbulence European Conference, 1-4 September, 2013, Lyon (France). Juan A. Sillero, Javier Jiménez.

Rendered Scene

Large-scale motion (LSM) in a TBL at Re_theta=6000: (x,y,z)/δ=(4.1,0.52,0.70)

(click for full size)

Visualization of velocity and pressure eddies

Instantaneous snapshot of a Turbulent boundary layer at Re_theta=6000.

Domain size: {1.6,1.2,1.8}*delta_99

Red-yellow: Positive eddies. 

Blue-Cyan: Negative eddies.

 

Pressure fluctuations structures

 

Streamwise velocity structures:

 

 

Wall-normal velocity structures: 

 

Spanwise velocity structures:

Small and Large scale streaky pattern of the streamwise velocity fluctuations:

y=15+,50+ and y=0.1*δ

Velocity Fluctuations in a TBL at High Reynolds Numbers

Streamwise velocity fluctuation u_rms^+ ranging from Re_\theta=3000 to Re_\theta=5000  at the near-wall region, y^+=15, where the maximum fluctuations are found.

Zoom:

 

Wall-normal velocity fluctuations v_rms^+ at y^+=180, where the maximum is approximately located.

Zoom:

 

 

 

 

Fluid Mechanic Lab: BSC Consolider Spot -in spanish-

It just came out, the Barcelona Supercomputing Center (BSC) Spot regarding the Consolider program.

Our group appears in min 7:24, in the category of engineering, and presented by our advisor Javier Jimenez Sendin.

More news concerning our group can be found at:

www.elpais.es

Actualidad Aeroespacial

Moncloaarava.com

 

Euro-MPI Congress 2011, slides

The 18th EuroMPI conference was held in Santorini, Greece, in September 2011.

EuroMPI is the primary meeting where the users and developers of MPI and other message-passing programming environments have the opportunity to meet each other and share ideas and experiences. It is unique opportunity for European MPI users to participate in the MPI forum, the standardization body of MPI.

The web page of the congress is: http://www.eurompi2011.org/

You can find the proceeding paper here. The published paper can be found in Computers & Fluids

Book of proceedings:

Recent Advances in the Message Passing Interface

18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings

 

Postprocessing DNS requires at least….

 

Working with Direct Numerical Simulation (DNS) databases requires at least one of those small beasts:

48 Gb Ram memory (it will see 192 Gb shortly).

12 cores, hyperthreaded Intel Xeon CPU   X5650  @ 2.67GHz

 

 

Creating movies using png files and FFmpeg encoder

 

The next script is useful to generate movies from a collection of png images using FFmpeg enconder. Before that, you can pre-process your images, for example removing the white border of your images. That can be easily done using the powerful “convert” command, from the ImageMagick package.

For example, removing the left and right side of a picture specifying the width in pixels size:

convert -chop 0×401 -rotate 180 -chop 0×400 -rotate 180 my_image.png chop_image.png;

Or you can just used a percentage of the image as: “-chop 9.5%”

With ffmpeg, you can specify the quality for the sampled pictures with -qscale (from 1 high quality to 31 low quality), the bitrate “-b”,  or the rate of frame per second for the output movie “-r”. FFmpeg expect you files to be numerated consecutively, and you can specify the series of files by using “%05d”, meaning files with 5 integer numbers name.

!/bin/bash

for i in $(ls | grep .png); do
    echo "Working in image $i"
    convert -chop 9.5% -rotate 180 -chop 9.5% -rotate 180 $i chop_$i;
done

ffmpeg -qscale 2 -r 25 -b 18000 -i chop_fig.%05d.png my_movie.flv

exit


If you need to rename a series of files that are not numbered starting from 1, you can use that another script that will rename it for you:

#!/bin/bash
cnt=1
for myfile in `ls *.png`
   do
    cntt=$(printf "%.3d" $cnt)
    myfilem=$(echo $myfile | cut -d"." -f1)
    echo $cntt $myfile $myfilem
    cp $myfile $myfilem"."$cntt"."png
    cnt=$((cnt+1))
done