Tuesday 19 July 2011

Finding steps beneath the noise


The problem of noise removal from signals (a signal is just a sequence of measurements of some quantity like a wind speed or stock price) which are secretly composed of steps is surprisingly common and important, and arises in a host of disciplines, including analysis of drill hole data in exploration geophysics, detecting DNA copy-number ratios in genomics, and separating out molecular dynamics from background noise. In each case one seeks to filter away the fluctuations in one's signal to leave behind the steps one believes are contained within.
Our recent papers with the glamourous titles Generalized methods and solvers for noise removal from piecewise constant signals
I. Background theory and II New Methods (articles free online), which appeared in the Proceedings of the Royal Society A, introduced a new mathematical framework for the problem, and, as special cases of this framework, developed some new signal processing algorithms to address the problem. The study of this task has been very fragmented across disciplines (from geoscience to physics to biology) so one of our major goals was to present a synthesis of existing work which would expose natural developments. For example having performed our synthesis we presented a particularly simple new method called "robust jump penalization", that exhaustively tests for the location of a new step when the noise is not from a normal or Gaussian distribution while subject to the constraint that the number of jumps should be small. Max and Nick

Wednesday 6 July 2011

Bringing the dance steps of molecular motors into focus

We're all familiar with the behaviour of objects at everyday sizes: if they are heavy, they stay put due to friction, and it's difficult to get them to move anywhere quickly unless you use a lot of force. If instead they are very light, then they can get smoothly carried around on air currents like a feather. That's just a couple of intuitive analogies that don't work at all at the molecular scale! Down inside the workings of cells, at the scale of molecules, everything is constantly buzzing around due to thermal effects and collisions with other molecules scramble up the motion. So simply scaling down our everyday machines might not be the best way to function in this noisy environment. Yet, life is not just a mash of molecules: there are exquisite mechanisms inside cells for doing things (fairly) reliably and repeatably. This includes the rather amazing bacterial flagellar rotary motor that cells use to propel themselves towards their food (fun cartoon of its assembly). It has long been hypothesised that the motor action has to be made in lots of tiny little steps, because this is the only way for a molecular scale machine to efficiently use the available (free) energy to work against the thermal noise that wants to dissipate any organized motion.

Biological physicists (like Richard Berry and his team) are intently interested in understanding exactly how this stepping motor works, and they have recently been able to develop the instrumentation to start answering these kinds of questions. Our group is interested in how life deals with fluctuations so we were very interested in their work. Note that you can't use microscopy techniques like conventional electron microscopy, because you have to kill the cell in order to image it. Then you have no chance of seeing the motor in action.

Because of the jostling microscopic environment the data produced by these new systems can be very noisy. We developed statistical methods that tried to separate out the physics we did know (aspects of mechanics and thermal noise) from the biological physics we didn't (how the motors actually steps between successive molecular configurations). This is where statistical modelling comes in: it turns out that the problem of separating step-like motion from the noise in the data so you can image the step-like motion is difficult unless you invoke non-traditional signal processing methods. You also need to take account of the experimental apparatus that introduces some unavoidable lag into the system (Richard's team attaches a tiny bead to the flagellum and tracks this bead). In our soon-to-be-published Biophysical Journal paper "
Steps and bumps: precision extraction of discrete states of molecular machines using physically-based, high-throughput time series analysis" (you'll find a pre-print on our publications page), we also introduced a novel technique based on a kind of probabilistic periodicity detection, for teasing out the cylindrical arrangement of proteins that make up the main structure of the motor, which gives the motor a characteristic set of "dance steps". Max and Nick

Friday 1 July 2011

Power and genes: effects of mitochondrial variability

Cells make stuff and they need power to do this. But what would happen if their power stations were as variable as wind power? Presumably you'd expect that, for cells with a lower average supply of power, they would make stuff (e.g. proteins and protein precursors called RNA transcripts) more slowly and take longer to go forth and multiply (move through their cell cycle). Since you'd imagine that these consequences of power supply variability would be as disadvantageous for the cell as power-cuts for us, I would have guessed that this kind of variability in power would be tightly controlled by the cell. Seems maybe not.

Francisco Iborra now at the Centro Nacional de Biotecnologica in Madrid came to me with some startling data that he had been producing, along with his student Ricardo Neves (now at Biocant in Portugal) and others at the Weatherall Institute for Molecular Medicine. It suggested that maybe cells do show a form of marked power variability: some cells appear to be making gene products faster than others (the picture shows variability in the rate at which gene transcripts are made between cells - each cell's nucleus is the circular blob) and this is related to measures of their mitochondrial content (the number of power stations they have). The cells with more mitochondria at birth also appear to divide sooner than more disadvantaged siblings. The paper "Connecting Variability in Global Transcription Rate to Mitochondrial Variability" can be found on our papers page and appeared in PLoS Biology. Becky Ward it takes 30 - was nice enough to blog interestingly about it (she's worth following).

This might seem like a curiosity, but the fact that some cells differ from others can be very important. While we tailor our treatments of sets of cells, like cancers, to typical cells - maybe cells which, by chance, are very atypical (e.g. lazy and power limited) will respond in a very different way. You only need a few such unusual cells to survive your treatment and they'll repopulate your cancer. In fact the study of cellular variability and its origins is now a major field driven by fun researchers like t h e s e. It's still not very clear why two genes in the same organism might co-vary in the levels of their gene products (called extrinsic noise) and our paper makes an experimental contribution to this: we've just finished a paper which combines a mathematical model plus some more experiments to help illuminate our findings further. Nick

Thursday 16 June 2011

Crisis and Changes in Communities of Assets

The global financial system is composed of many different financial markets on which a diverse set of assets are traded. Because there are so many assets traded on some markets it can sometimes be convenient to think about groupings of them. For example, shares are assigned to industry sectors based on the business activities of their companies. These sectors provide a useful tool for sorting and comparing different companies, and it can be insightful to compare the performance of stocks within the same sector to identify any that are under- (or over-) performing. For some markets, however, an external classification like this is not possible. An alternative approach is to group assets based on the behaviour of their prices. Two assets which have strongly correlated price changes (they increase and decrease in value at similar times) would belong to the same group and two assets which are weakly correlated belong to different groups. Groups identified in this way can be useful for several reasons but a familiar one is in the construction of diversified portfolios to minimize investment risk – see the “Can we spread the risk?” post.

Much prior work along these lines focused on equity markets, so we set out to investigate the group structure of the foreign exchange (FX) market. To do this, we represented the FX market as a network in which each node represented an exchange rate (such as the EURUSD rate which gives the number of US dollars that one receives in exchange for 1 euro) and each edge connecting pairs of exchange rates represented the strength of the correlations between those rates. A network like this is similar to a social network (such as a facebook network) in which each node represents a person and two people are linked if they are friends. A group in the exchange rate network (known as a community) then corresponded to a set of nodes that had stronger links to each other than they did to the rest of the network.

Importantly, the exchange-rate groups that we found changed through time depending on market conditions, so we introduced several techniques to track the changing relationships between the rates. Using this approach, we were able to uncover major trading changes that occurred in the FX market during the 2007-2008 credit crisis and to identify the relative importance of the different rates.

You can find out more about this work in the paper, “Dynamic communities in multichannel data: An application to the foreign exchange market during the 2007–2008 credit crisis” Chaos 19, 1 (2009). Dan and Nick

Monday 13 June 2011

The high-school hierarchy in protein networks

The building blocks of cells, proteins, interact and form networks (each node is a protein and each link an interaction between them) See also Sumeet's post. We think that biological systems are modular. A module is a section of a whole that carries out its function relatively independently of the rest of the system (for example, consider the various components of your PC). Candidates for modules in a network are communities, groups of proteins that interact very closely with each other, and not so much with the rest of the network. There are algorithms that exist that can detect such communities.

But maybe there are communities inside each community. If you consider the social interaction network of a school (where nodes are pupils and links are friendships between them), you might expect there to be several large communities, one for each of the year groups. But, if pupils are more likely to be friends not only with someone in their year, but also with someone in their class, then each one of these large year-group communities would consist of several smaller communities, one for each of the class groups. At a yet smaller scale, each of these class-group communities might contain several friendship-group communities. In other words, there is structure of interest at many scales within the network.

We set out to investigate the multi-scale community structure of protein interaction networks. You can see this structure visualised in the image. On the top line (log(lambda)=-1) we are looking at the network at a low resolution: all the nodes are considered to be in one (purple) community. Moving down the figure (e.g. log(lambda)=1) increases resolution (like paying attention to classrooms instead of the whole school) and more structure is resolved: this community starts to split into several large communities. As we crank up the resolution yet higher (further down the image), these communities themselves split up.

Why are we interested in this structure? Proteins within a community might be expected to all carry out a similar task. We were interested if this was true at all scales: the result varied depending on which groups of proteins we were looking at. We were also interested because we simply don't know anything about many proteins, but perhaps the communities they are members of can suggest functions for them. In the school analogy, if all you knew about a pupil was which communities they were a member of, you could make a pretty good guess at many things particular to them, e.g. who their teacher was, whether they'd be studying for exams.

You can read the full story in our paper "The Function of Communities in Protein Interaction Networks at Multiple Scales" in BMC Systems Biology 4, 100 (2010): here. You can also read more about this work in this short review in Biomedical Computational Review. Anna
and Nick

Saturday 11 June 2011

Promiscuous proteins: do proteins "date or party"?


Proteins are perhaps the major building blocks of the cell. They come in many different shapes and sizes, and they can join together (a bit like lego) to make all sorts of useful structures. There are ways to come up with simple representations of such systems: one way of doing this is to think of them as networks (like we have computer networks, or railway networks, or facebook networks). Displayed is a picture of a network of proteins; two proteins are joined if they interact (stick to each other). The different colours are different 'communities' of proteins. A community is just a group of proteins that interact a lot more amongst themselves than they do with outsiders. However, this picture isn't complete, because it's static. Cells are dynamic; they have a life-cycle, just like us, and they go through different stages: growing, dividing, dying. We are often interested in understanding what drives these changes; for instance, cancer happens when the growing and dividing stages go into overdrive. What is happening to the protein network as the cell goes through its different stages? At each stage, only part of the network is 'switched on'; the cell is making only those proteins needed for that stage. Imagine different parts of the network lighting up at different times. If we take this into account, can it help us to better understand what roles the different proteins are playing in the great cellular drama?

One interesting idea, suggested some years ago, was that if we focus on the seemingly important proteins, the ones that have many interactions (called 'hubs'), maybe by looking at when these interactions light up we can say something about what kind of protein it is. Supposing I am a hub protein in the network, with lots of partners. There could be two opposing scenarios: maybe all my partners get produced by the cell at the same time, and so all the interactions happen at once. In this case, it's like a big party; so I would be called a 'party hub'. On the other hand, it could be that my partners get switched on at different times (or places). In this case, my interactions happen one by one, like a sequence of dates, and so I would be called a 'date hub'. The idea that hubs came in two flavours, date and party, was quite exciting, because the two types seemed to have important roles in organising the whole network. Party hubs were like local coordinators: they helped to bring together many proteins with the same purpose. Date hubs were global organisers; they communicated between different parts of the network. Knowledge of what specific date and party hubs were doing could be a major step forward in understanding how the complicated protein cocktail produces specific kinds of cell behaviours.

Unfortunately, things turn out be not so simple. Several people disputed the idea that 'date' and 'party' hubs really existed, presenting evidence (e.g. here and here, noting this response and this article and noting, in fact, that this literature moves fast!) that there was no consistent relationship between the pattern in which the interactions light up and the protein's role in organising the network. Despite this the idea remained in vogue. In our recent article "Revisiting Date and Party Hubs: Novel Approaches to Role Assignment in Protein Interaction Networks" free in PLoS Computational Biology we suggest that, based on their patterns of connections to different protein communities, that the so-called date hubs (as defined) are not really any more likely to be global network coordinators than the party hubs. Moreover, protein hubs display a wide variety of 'lighting up' patterns for their interactions, and classifying them into just these two types is perhaps not carving nature it its joints.

It is not all bad news, however. So far, we have been thinking about roles for individual proteins. But what if we instead focus on interactions between proteins? In other words, what if we try to assign roles to the lines in the network, rather than the dots? Imagine that the lines are roads, joining up a bunch of cities. If I want to drive from one city to another, I will try to find the shortest path between them. Now, suppose we remove one of the lines; one road suddenly gets destroyed. How many of those shortest paths between cities have to be re-routed? If the answer is lots, then it means that the link we removed was important to efficiently connecting up the network. So, for each link, one way of measuring its importance is how many paths have to be re-rerouted if it's removed: this is called a betweenness. How is this betweenness relevant to the network of proteins? We found that the betweennness of a link is strongly related to the similarity of the two proteins joined by that link: the links with the highest betweenness tend to be interactions joining the most dissimilar proteins. This partly seems to mirror something observed in social networks, where a distinction can be made between 'weak' ties (or links) and 'strong' ties. Strong ties are close relations or friends; weak ties may be less familiar, or similar, acquaintances.

We might imagine weak ties are important for communicating information across the network: for example, if you are looking for a job, it seems more likely that someone like a friend's colleague will be able to provide a useful tip than someone whom you know very well. Coming back to protein networks, if we think of betweenness as a way of measuring a link's importance for information flows between proteins, then our results indicate that here too the most important links are 'weak', in the sense that they are between dissimilar proteins that have different functions and are not part of the same group. This suggests that a deeper understanding of the roles played by specific links may help us to unravel the tangled webs of proteins that control and comprise cells, and thus ultimately, life itself. Sumeet and Nick. You can find a longer version of this article on Sumeet's blog.

Thursday 9 June 2011

Can we spread the risk?


A primary concern for many financial-market practitioners is the strength of correlations between price changes of different assets; that is, whether prices move up or down at the same time. There are many reasons for investors to think about correlations, but perhaps the most familiar is risk management. If an investor owns strongly correlated assets then there is a high level of risk in their investments – decreases in the value of one asset are often accompanied by falls in the other assets. More generally, the strength of correlations is of interest because it can shed light on the state of the global economy. Because correlations can sometimes be explained by macroeconomic factors, looking at their levels can help to illuminate the forces driving markets.

Historically, assets from different markets tended to behave in different ways, which made it possible to achieve reasonable diversification by buying different types of asset. In our paper "Temporal Evolution of Financial Market Correlations", recently accepted by Physical Review E, however, we show that since the 2007-2008 credit crisis things are not that simple: as
sets that previously moved more or less independently now behave in a very similar manner. We demonstrate this phenomenon using principal component analysis and show that there has been a significant increase in correlations since the crisis. This has profound implications for risk management because diversification is now much more difficult. It also suggests that lots of different assets are now driven by the same economic forces. Dan and Nick

Friday 3 June 2011

Flow with the Grow

In order to remain metabolically active, living things require a source of energy, and a regular supply of molecules of various kinds. As living things are composed of cells, large organisms inevitably face a fundamental challenge: they need to supply all of their component cells with the resources needed for survival. Mammals have cardio-vascular systems, plants have xylem and phloem, but how do fungi tackle the fundamental transport challenge? Compared to the other major kingdoms of multi cellular life, transport in fungi is poorly understood. This is somewhat surprising, as transport in fungi is an ecologically critical process. Fungi are an essential component of soil: without fungi leaf litter would not degrade, and many fungi form foraging networks which circulate carbon, nitrogen and phosphate. If a fungus grows, that increase in volume must come from somewhere: if the source of new volume is distant from the growth then this must create flows in the network. Because the volume is mostly water, which here is effectively incompressible, growth in one part of the network will be rapidly coupled to the rest. We suggest that fluid flows associated with growth might themselves be the major form of long-range transport in fungi. To investigate transport in fungi and the developmental logic of fungal networks, we photographed growing fungal networks, and digitized the images to produce a sequence of matrices that describe how the networks change over time. For each sequence of networks we identify a set of fluid flows which are as small as possible whilst being consistent with the observed changes in volume. We found that those parts of the network that were predicted to carry a large current typically thickened over time, while other parts of the network became thinner, or were consumed by the fungi in order to fuel further exploratory growth. So our crude idea (that flows are directly coupled to growths) did seem to be, at least partly, consistent with the data. You'll find our paper "Growth-induced mass flows in fungal networks" here and in its journal version (free, from Nov 2011, at the Proceedings of the Royal Society B) here. The Institute for Science, Innovation and Society did a blog on this article as well. Luke and Nick

Stochastic Survival of the Densest: defective mitochondria could be seen as altruistic to understand their expansion

With age, our skeletal muscles (e.g. muscle of our legs and arms) work less well. In some people, there is a substantial loss of strength an...